So the issue is: 1) they were putting ads on top of Craigslist, and 2) they were using up too much bandwidth. I expect if you actually wanted to build something useful you could just email Craig and he'd get you a dump.
There have been a few others, one fellow made a search system (using the rss feeds) where you can search across cities (by using checkboxes to select the different areas). Craiglist didn't like that.
I guess the bigger question is, what would Craigslist like? I am not sure, I just have not seen anything survive yet because Craigslist hasn't liked anything anyone has done so far.
Ha, with EC2 you can get around that now. Separate your website IP from your crawling IP, and every time CL blocks an IP, switch to a new one. Eventually they'll have to block the entire AWS range, but that's okay, you can crawl over cablemodem/DSL connections that use DHCP. What are they going to do, block Verizon, Comcast and Time Warner?
Then you can get around referer [sic] blocks for your links to CL that the user clicks on by using a redirect, I think.
You can slurp the entirety of CL daily without causing them traffic problems. I mean it's equivalent to each page getting one page view per day, which is nothing. Just keep track of URLs so you only slurp new content, and serve thumbnails off your own hosts (it's fair use).
Why would you go to all that effort if they don't want to play with you in the first place? If they really wanted to block you, they'll do it through their lawyers.
It's also very easy to differentiate between a crawler and a user. But they don't even have to do that. Blocking the entire AWS ip range isn't hard either. I doubt any craigslist users browse the site through AWS :)
http://mashable.com/2007/06/08/listpic-craigslist/