Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Close, robots.txt was originally for web crawlers, to reduce accidental denial-of-service attacks. It had nothing to do with the scraping (i.e. downloading content and parsing the HTML tags in a programmatic manner).


What do you think a search engine’s crawler bot is doing exactly? I could sure be wrong, but I have a hunch that “downloading content and paraing the HTML tags in a programmatic manner” describes it.


Yes, but the difference is that the term "scraping" also targets things like automatically generating RSS feeds from HTML pages, which is not covered by robots.txt.


I thought robots.txt covered all automated, programmatic access by third parties where a bot slurps stuff and follows links, without splitting hairs about it.

But what do I know, the young whippersnappers will just word lawyer me to death, so I better shut up and go away.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: