My guess is that scientific computation is a lot of fun. Plenty of clean mathema...

scott_s · on Nov 5, 2009

There is lots and lots of "dingy boiler room stuff" in scientific computing. What the math is doing is clean. Doing that as efficiently as possible in a computer is generally not.

easp · on Nov 7, 2009

And there is all the data handling/mangling that needs to be done.

adw · on Nov 5, 2009

Oh man. Nonono.

It's huge amounts of glue and connect-this-program-to-that-program-with-a-pile of regexes. Most of the number-crunching's in hand-optimized C or F90...

During my PhD, I designed/wrote a transition-state prediction algorithm in Python, hooking it up to atomistics codes written in Fortran. One of my colleagues, now one of my cofounders at Timetric, wrote a standards-compliant XML library in ANSI F95 - http://uszla.me.uk/FoX/ - which, believe it or not, is one of the rare cases where XML made life a lot better!

whye · on Nov 5, 2009

On a computer, you have to also be concerned about error propagation and efficiency, and that can take what might be a very beautiful, simple solution on paper and explode it into a big ugly mess in code.

That doesn't mean it isn't fun, but it's a lot less clean than you might think.

bravura · on Nov 5, 2009

In NLP there is a large amount of boilerplate and perprocessing code. The barrier to entry is actually quite high. New machine learning model for machine translation? "Sorry I'm not convinced your model would work with a sophisticated multitext grammar using available translation lexicons etc etc".

Building a convincing baseline is hard. Which means that it is difficult to show your approach works in general

jlees · on Nov 6, 2009

The Python NLTK helps a fair bit with some of that, especially at an introductory level.

Of course, I hand-wrote a lot of my NLP algorithms in Perl, back in the day. Now that's a good use of a time machine...

the_real_r2d2 · on Nov 6, 2009

For beginners there is also an on-line book:

http://www.nltk.org/book

jlees · on Nov 6, 2009

As well as a printed version, published by O'Reilly, for those of you who prefer dead trees to pixels (like me).