Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How long did the whole piece take to put together, and what's the rough break-down of time spent on each component (data wrangling, finding useful sorts, visualizations, write-up)? Thanks!


Fulltime, around 6 weeks. Breakdown is hard to say.

I wasted a lot of time trying to do things the "traditional" way by loading into SQL, querying, etc, but it was actually much faster to process things in memory (I have a 16 gig machine). Intensive stuff was parallelized in Go and used ordinary filesystem with directory prefix tries for performance.

Writeup was mostly SW. He's worked on it maybe an afternoon a week for the last month.

I really enjoy visualizations and can iterate extremely fast (e.g. ChordPlot took half an hour). Don't know why M is not the defacto standard for dataviz people. Tweaking takes a long time, and design iterated with me on getting things looking really nice.

All in all, most of my time was spent building tools to easily create multidimensional histograms. The nice thing is that those tools are clearly useful enough we'll integrate them into Mathematica, so the cost is somewhat amortized.

NLP took a few weeks of Etienne's time... once again, amortized. Most of that is wrangling, really, and building tools to understand the deficiencies of your training set. Naive Bayes works surprisingly well, the magic is in the tooling and "human intelligence" you iterate with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: