The disks are spinning all the time, and most disks are seeing fairly frequent reads to some content or another. A lot of content is very rarely accesses, but almost every disk has some content which gets accessed. If spinning disks had only frequently-accessed content, they would be unable to keep up with the read rate or read throughput, things balance out reasonably on average.
Wayback content is on the same disks as most other content, in the form of WARC files, with individual records fetched out of the middle of WARC files via HTTP range request.
Note: I work at IA but am not on core infrastructure team
Interesting thanks for the insights! Then would the few second delay be more a matter of time it takes to decompress the contents, or that files are stored on disks which are being accessed a lot, or something else? Always been curious about it.
Wayback content is on the same disks as most other content, in the form of WARC files, with individual records fetched out of the middle of WARC files via HTTP range request.
Note: I work at IA but am not on core infrastructure team