From what I've read, the L1/L2 cache per core is the same, and the L3 cache per chiplet is the same, but the cores per chiplet is doubled and the overall chiplet size is about the same (it's a little bigger, I think)
So, L3 cache didn't get smaller (in area or bytes), there's just less for each core. L1/L2 is relatively small, but they did use techniqued to make it smaller at the expense of performance.
I think the big difference really is the reductions in buffers, etc, needed to get a design that scales to the moon. This is likely a major factor for Apple's M series too. Apple computers are never getting the thermal design needed to clock to 5Ghz, so setting the design target much lower means a smaller core, better power efficiency, lower heat, etc. The same thing applies here: you're not running your dense servers at 5Ghz, there's just not enough capability to deliver power and remove heat; so a design with a more realistic target speed can be smaller and more efficient.