Common Crawl maintains a free, open repository of web crawl data

tony-allan · 2025-03-05T00:44:26 1741135466

I haven't seen this resource before.

You can also search the index [1]

I downloaded and ran the example code [2] to lookup a URL and fetch its content and the response was instantaneous!

  [1] http://index.commoncrawl.org/CC-MAIN-2025-08-index?url=ycombinator.com&output=json
  [2] https://commoncrawl.org/get-started