Depending on what your usage requirements are, Mac Minis running UMA over RDMA is becoming a feasible option. At roughly 1/10 of the cost you're getting much much more than 1/10 the performance. (YMMV)
I did not expect this to be a limiting factor in the mac mini RDMA setup ! -
> Thermal throttling: Thunderbolt 5 cables get hot under sustained 15GB/s load. After 10 minutes, bandwidth drops to 12GB/s. After 20 minutes, 10GB/s. Your 5.36 tokens/sec becomes 4.1 tokens/sec. Active cooling on cables helps but you’re fighting physics.
Thermal throttling of network cables is a new thing to me…
I admire patience of anyone who runs dense models on unified memory. Personally, I would rather feed an entire programming book or code directory to a sparse model and get an answer in 30 seconds and then use cloud in rare cases it's not enough.
Right, not to "defend" the paper's claims, but it seems to be more like tuning how the leaky bucket leaks, using lossy compression to try to preserve some measure of coherency? Seems to turn on the fixed size summary.
reply