Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hacking together something to parse text?

I used it to find a house[1]. I needed to rent two apartments close to each other. I wrote some shell/awk scripts that take several real estate web sites as input, parse them, extract all data about apartments including URL, address and price, pipe the addresses into Google maps API, find the exact coordinates of all apartments, calculate distances between them and produce a list of pairs of apartments sorted by the distance between them. It works incredibly well, it routinely finds houses that are on the same street or apartments that are in the same building.

The first version of the code was written when I was in Vienna, looking for a house. It took me an evening.

The only thing that comes even close to this in power is perl, but I don't like perl, I think it's overkill when you only need to parse text, not do some other computation, and I did all this work under Plan9 which doesn't have perl anyway.

Oh, and what was most useful after all this was creating one liners for doing filtering, like finding pairs that had total cost under some value, total number of rooms above some value and distance between them in some interval. I found this one liners to be easier in awk then perl.

Your argument against awk because is 2011 is the same as saying let's drop C because it's 2011. Both C and awk solve some things well, for other things there are other languages.

[1] http://code.google.com/p/operation-housefinder/



I find that any argument along the lines of "Oh goodness, it's [YEAR] for crying out loud" is generally rubbish. Not always, but often enough that I've noticed the correlation.

I know Perl, I know Ruby, I know Brainfuck and I know Awk. I'm also not a masochist. I don't use Brainfuck but I use the hell out of Awk. Myself and other Awk users like 4ad here aren't using Awk because we're crazy, we've calculated effort/rewards/tradeoffs and come upon a solution.

I find people suggesting that others should or should not use various tools to be incredibly condescending. In doing so you are effectively refusing to recognize others as your peers. If you're Dijkstra and everyone around you has a hardon for GOTO, then be my guest, but you are not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: