Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can imagine cases where map-reduce is useful without any starting data. If you are analyzing combinations or permutations, you can create a massive amount of data in an intermediate step, even if the initial and final data sets are small.


Have you got any links on how to do that, as it sounds very like a problem I am trying to sole just now - combinations of DNA sequences that work together on a sequencing machine.

At the moment I am self joining a table of the sequences to itself in MySQL, but after a certain number of self joins the table gets massive. Time to compute is more the problem rather than storage space though, as I am only storing the ones that work (> 2 mismatches in the sequence). Would Map Reduce help in this scenario?


If I had your problem, the first thing that I would do is try PostgreSQL to see if it does the joins fast enough. Second thing that I would try is to put the data in a SOLR db and translate the queries to a SOLR base query (q=) plus filter queries (fq=) on top.

Only if both of these fail to provide sufficient performance, would I look at a map reduce solution based on the Hadoop ecosystem. Actually, I wouldn't necessarily use the Hadoop ecosystem. It has a lot of parts/layers and the newer and generally better parts are not as well known so it is a bit more leading edge than lots of folks like. I'd also look at somethink like Riak http://docs.basho.com/riak/latest/dev/using/mapreduce/ because then you have your data storage and clustering issues solved in a bulletproof way (unlike Mongo) but you can do MapReduce as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: