Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It was a pretty easy problem, parsing logs for performance statistics. But moving the data is the easy part and that's why I was incredulous of the OP's statement.

I'm starting to wonder if this is really "Hacker News" or if it's "we want free advice and comments from engineers on our startups so lets start a forum with technical articles"



Big Data should be on the Peta+ level. Even with 10G Ethernet it takes a lot of bandwidth and time to move things around (and it's very hard to keep 10G ethernet full at a constant rate from storage). This is hard even for telcos. Note Terabyte+ level today fits on SSD.


Not really, "Big Data" has nothing to do with how many bytes you're pushing around.

Some types of data analytics are CPU heavy and require distributed resources. Your comment about 10G isn't true. You can move around a Tb every 10 minutes or so. SSDs or a medium sized SAN could easily keep up with the bandwidth.

If your data isn't latency sensitive and run in batches, building a Hadoop cluster is a great solution to a lot of problems.


Of course big data is about number of bytes. That's what something like map reduce helps with. It depends on breaking down your input into smaller chunks, and the number of chunks is certainly related to the number of bytes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: