Phone no., names, addresses, e-mails, etc. all out. This is indeed pretty bad. A lot of people I know who use BTC use it foremost for privacy reasons, it is tremendously ironic how this has worked out.
Using a robots.txt-file to hide data that shouldn't be public in the first place is a rather bad idea. Because the robots.txt itself is public, it actually highlights the location of the "private" data.
That is what I am wondering. Here I am building a web site and wondering how to make Google index sites behind authentication and such, and here is a site that has got everything indexed.
After reading Coinbase's response, it seems that the pages were not linked to by Coinbase, but by users posting links to their checkout page on the Internet.
You can also specify a meta robots tag inside the page HTML. If you want to block a lot of pages, your best bet would be to add it to your master layout or template. You get the same effect of blocking on robots.txt but without exposing a list of blocked pages.
The downside is that Google will still crawl the page and use your bandwidth, but the page won't be indexed.
I would suggest both these two:
Check the user agent for bots and if it is a bot send a 404 header and exit before page needs to load.
Also add a meta noindex just in case.
Robots.txt DOES NOT prevent indexing, just crawling.
Shameful. I know little about web development but this seems rather obvious, even to me.