listpic? http://mashable.com/2007/06/08/listpic-craigslist/

aaronsw · on April 21, 2008

So the issue is: 1) they were putting ads on top of Craigslist, and 2) they were using up too much bandwidth. I expect if you actually wanted to build something useful you could just email Craig and he'd get you a dump.

gscott · on April 22, 2008

There have been a few others, one fellow made a search system (using the rss feeds) where you can search across cities (by using checkboxes to select the different areas). Craiglist didn't like that.

I guess the bigger question is, what would Craigslist like? I am not sure, I just have not seen anything survive yet because Craigslist hasn't liked anything anyone has done so far.

vesterr · on April 20, 2008

Ha, with EC2 you can get around that now. Separate your website IP from your crawling IP, and every time CL blocks an IP, switch to a new one. Eventually they'll have to block the entire AWS range, but that's okay, you can crawl over cablemodem/DSL connections that use DHCP. What are they going to do, block Verizon, Comcast and Time Warner?

Then you can get around referer [sic] blocks for your links to CL that the user clicks on by using a redirect, I think.

You can slurp the entirety of CL daily without causing them traffic problems. I mean it's equivalent to each page getting one page view per day, which is nothing. Just keep track of URLs so you only slurp new content, and serve thumbnails off your own hosts (it's fair use).

nuggien · on April 20, 2008

Why would you go to all that effort if they don't want to play with you in the first place? If they really wanted to block you, they'll do it through their lawyers.

vesterr · on April 21, 2008

That stuff is really easy.

And you may convince them to join you after all, like YouTube and the media companies they parasite off of.

nuggien · on April 21, 2008

I didn't say it was hard.

It's also very easy to differentiate between a crawler and a user. But they don't even have to do that. Blocking the entire AWS ip range isn't hard either. I doubt any craigslist users browse the site through AWS :)