Why are the checkout pages even public? No robots.txt, a lot of private informat...

clicks · on April 5, 2013

https://encrypted.google.com/search?q=site:https://coinbase....

Phone no., names, addresses, e-mails, etc. all out. This is indeed pretty bad. A lot of people I know who use BTC use it foremost for privacy reasons, it is tremendously ironic how this has worked out.

thefreeman · on April 5, 2013

If you are using BTC for privacy, then using a third party hosted wallet is not a very good plan.

johnyzee · on April 5, 2013

Presumably you can take your bitcoins out whenever you want if you need to use them anonymously.

smallegan · on April 5, 2013

This is seller information to tell you who you are paying. There is no data leak here.

Zirro · on April 5, 2013

Using a robots.txt-file to hide data that shouldn't be public in the first place is a rather bad idea. Because the robots.txt itself is public, it actually highlights the location of the "private" data.

benmanns · on April 5, 2013

And, Google had to find a link to this content somehow, which means it's publicly accessible from some Coinbase page.

manojlds · on April 5, 2013

That is what I am wondering. Here I am building a web site and wondering how to make Google index sites behind authentication and such, and here is a site that has got everything indexed.

benmanns · on April 5, 2013

After reading Coinbase's response, it seems that the pages were not linked to by Coinbase, but by users posting links to their checkout page on the Internet.

ntumlin · on April 5, 2013

What's a good alternative to robots.txt?

espo · on April 5, 2013

Proper access control on your web site.

Cthulhu_ · on April 5, 2013

Simply not showing the transaction if you're not logged in and not the user belonging to the transaction?

enoptix · on April 5, 2013

You can also specify a meta robots tag inside the page HTML. If you want to block a lot of pages, your best bet would be to add it to your master layout or template. You get the same effect of blocking on robots.txt but without exposing a list of blocked pages.

The downside is that Google will still crawl the page and use your bandwidth, but the page won't be indexed.

JimWestergren · on April 5, 2013

I would suggest both these two: Check the user agent for bots and if it is a bot send a 404 header and exit before page needs to load. Also add a meta noindex just in case. Robots.txt DOES NOT prevent indexing, just crawling.

dpcx · on April 5, 2013

Blocking access to googlebot for those pages is the easiest. But robots.txt would work, if you just did

Ignore /checkouts/