Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why are the checkout pages even public? No robots.txt, a lot of private information listed and public.

Shameful. I know little about web development but this seems rather obvious, even to me.



https://encrypted.google.com/search?q=site:https://coinbase....

Phone no., names, addresses, e-mails, etc. all out. This is indeed pretty bad. A lot of people I know who use BTC use it foremost for privacy reasons, it is tremendously ironic how this has worked out.


If you are using BTC for privacy, then using a third party hosted wallet is not a very good plan.


Presumably you can take your bitcoins out whenever you want if you need to use them anonymously.


This is seller information to tell you who you are paying. There is no data leak here.


Using a robots.txt-file to hide data that shouldn't be public in the first place is a rather bad idea. Because the robots.txt itself is public, it actually highlights the location of the "private" data.


And, Google had to find a link to this content somehow, which means it's publicly accessible from some Coinbase page.


That is what I am wondering. Here I am building a web site and wondering how to make Google index sites behind authentication and such, and here is a site that has got everything indexed.


After reading Coinbase's response, it seems that the pages were not linked to by Coinbase, but by users posting links to their checkout page on the Internet.


What's a good alternative to robots.txt?


Proper access control on your web site.


Simply not showing the transaction if you're not logged in and not the user belonging to the transaction?


You can also specify a meta robots tag inside the page HTML. If you want to block a lot of pages, your best bet would be to add it to your master layout or template. You get the same effect of blocking on robots.txt but without exposing a list of blocked pages.

The downside is that Google will still crawl the page and use your bandwidth, but the page won't be indexed.


I would suggest both these two: Check the user agent for bots and if it is a bot send a 404 header and exit before page needs to load. Also add a meta noindex just in case. Robots.txt DOES NOT prevent indexing, just crawling.


Blocking access to googlebot for those pages is the easiest. But robots.txt would work, if you just did

Ignore /checkouts/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: