It was a pretty easy problem, parsing logs for performance statistics. But movin...

alecco · on Nov 10, 2013

Big Data should be on the Peta+ level. Even with 10G Ethernet it takes a lot of bandwidth and time to move things around (and it's very hard to keep 10G ethernet full at a constant rate from storage). This is hard even for telcos. Note Terabyte+ level today fits on SSD.

oceanplexian · on Nov 10, 2013

Not really, "Big Data" has nothing to do with how many bytes you're pushing around.

Some types of data analytics are CPU heavy and require distributed resources. Your comment about 10G isn't true. You can move around a Tb every 10 minutes or so. SSDs or a medium sized SAN could easily keep up with the bandwidth.

If your data isn't latency sensitive and run in batches, building a Hadoop cluster is a great solution to a lot of problems.

schrodinger · on Nov 10, 2013

Of course big data is about number of bytes. That's what something like map reduce helps with. It depends on breaking down your input into smaller chunks, and the number of chunks is certainly related to the number of bytes.