It’s interesting that L2 cache has basically been steady at 2MB/core since 2004 ...

ip26 · on March 4, 2022

Every level of cache strikes a balance between latency & capacity. Bigger caches have higher latency; it's a fundamental property of caches.

What you can conclude is that 0.5MB-2MB and 12-15 cycles of latency has been a steady sweet spot for L2 size for twenty years.

Sidebar: it was a property of caches. 3D assembly may upend the local optima.

formerly_proven · on March 3, 2022

Some of these numbers are clearly wrong. Some of the old latency numbers seem somewhat optimistic (e.g. 100 ns main memory ref in 1999), some of the newer ones are pessimistic (e.g. 100 ns main memory ref in 2020). The bandwidth for disks is clearly wrong, as it claims ~1.2 GB/s for a hard drive in 2020. The seek time is also wrong. It crossed 10 ms in 2000 and has reduced to 5 ms in 2010 and is 2 ms for 2020. Seems like linear interpolation to me. It's also unclear what the SSD data is supposed to mean before ~2008 as they were not really a commercial product before then. Also, for 2020 the SSD transfer rate is given as over 20 GB/s. Main memory bandwidth is given as 300+ GB/s.

Cache performance has increased massively. Especially bandwidth, not reflected in a latency chart. Bandwidth and latency are of course related; just transferring a cache line over a PC66 memory bus takes a lot longer than 100 ns. The same transfer on DDR5 takes a nanosecond or so, which leaves almost all of the latency budget for existential latency.

edit: https://github.com/colin-scott/interactive_latencies/blob/ma...

The data on this page is simply extrapolated using formulas and guesses.

dekhn · on March 3, 2022

The oldest latency numbers were based on actual hardware Google had on hand at the time. Most of them came from microbenchmarks checked into google3. Occasionally an engineer would announce they had found an old parameter tuned for pentium and cranking it up got another 1-2% performance gain on important metrics.

Many of the newer numbers could be based on existing google hardware; for example, Google deployed SSDs in 2008 (custom designed and manufactured even before then) because hard drive latency wasn't getting any better. That opened up a bunch of new opportunities. I worked on a project that wanted to store a bunch of data, Jeff literally came to me with a code that he said "compressed the data enough to justify storing it all on flash, which should help query latency" (this led to a patent!).

throwawaylinux · on March 3, 2022

Bigger caches could help but as a rule of thumb cache hit rate increases approximately with the square root of cache size, so it diminishes. Then the bigger you make a cache, the slower it tends to be so at some point you could make your system slower by making your cache bigger and slower.

gameswithgo · on March 3, 2022

the bigger the cache the longer it takes to address it, and kinda fundamental physics prevents it being faster