a static cache for anyone not logged in, and only doing this check when you are ...

pynappo · 2025-05-02T03:59:37 1746158377

> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.

The goal of Anubis isn't to stop them from scraping entirely, but rather to slow down aggressive scraping (e.g. sites with lots of pages being scraped every 6 hours[1]) so that the scraping doesn't impact the backend nearly as much

[1] https://pod.geraspora.de/posts/17342163, which was linked as an example in the original blog post describing the motivation for anubis[2]

[2]: https://xeiaso.net/blog/2025/anubis/

jillyboel · 2025-05-02T16:01:29 1746201689

The point of a static cache is that your backend isn't impacted at all.

glenngillen · 2025-05-02T02:10:38 1746151838

That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.

esseph · 2025-05-04T05:44:52 1746337492

... Are you saying a bot couldn't authenticate?

Still need a layer there, could also have been a manual login to pull a session token.

lelanthran · 2025-05-02T07:18:33 1746170313

> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week.

ISTR that Anubis allows the site-owner to control the expiry on the check; if you're still getting hit by bots, turn the check to 5s with a lower "work" effort so that every request will take (say) 2s, and only last for 5s.

(Still might not help though, because that optimises for bots at the expense of humans - a human will only do maybe one actual request every 30 - 200 seconds, while a bot could do a lot in 5s).

fc417fc802 · 2025-05-02T10:05:01 1746180301

Rather than a time to live you probably want a number of requests to live. Decrement a counter associated with the token at every request until it expires.

An obvious followup is to decrement it by a larger amount if requests are made at a higher frequency.

CaptainFever · 2025-05-02T08:13:46 1746173626

Does anyone know if static caches work? No one seems to have replied to that point. It seems like a simple and user-friendly solution.

xena · 2025-05-02T12:55:26 1746190526

Caches would only work if the bots were hitting routes that any human had ever hit before.

jillyboel · 2025-05-02T13:05:48 1746191148

They'd also work if the bot, or another bot, hits that route before. It's a wiki, the amount of content is finite and each route getting hit once isn't a problem.