Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>You don't need Hadoop if you crunch through 100GB. True that, but Hadoop is for the >100TB stuff where we need the throughput and cost efficiency for storage (we have multiple >100TB sets and we are not able to afford 100's of optanes! But Hadoop is not good for some of the problems that are culled out of these data sets. We can't afford Hadoop nodes with lots of ram and fast disks.


Making up for slow hdd speeds was kind of the major reason Hadoop was invented: if you spread your data around 100 of slow disks and then run MapReduce job that reads from all 100 of them simultaneously then you effectively get 100x read speed. Plus, with rise of Spark which processes data primarily in RAM you can say that disk speeds are not really an issue for Big Data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: