For reference comparing to what the big companies use, an H100 has over 3TB/s bandwidth. A nice home lab might be built around 4090s — two years old at this point — which have about 1TB/s.
Apple's chips have the advantage of being able to be specced out with tons of RAM, but performance isn't going to be in the same ballpark of even fairly old Nvidia chips.
The cheapest 4090 is EUR 110 less than a complete 32GB RAM M2 max Mac Studio where I live. Speccing out a full Intel 14700K computer (avoiding the expensive 14900) with 32 GB RAM, NVMe storage, case, power supply, motherboard, 10G Ethernet … and we are approaching the cost of the 64GB M2 ultra which has a more comparable memory bandwidth to the Nvidia card, but with more than twice the RAM available to the GPU.
That's my point. I would absolutely be willing to suffer a 20% memory bandwidth penalty if it means I can put 200% more data in the memory buffer to begin with. Not having to page in and out of disk storage quickly make those 20% irrelevant.
If you have enough 4090s, you don't need to page in and out of disk: everything stays in VRAM and is fast. But it's true that if you just want it to work, and you don't need the fastest perf, Apple is cheaper!
How is that relevant when the discussion from the start was about comparing a two year old Mac with a two year old GPU as a cost-benefit discussion.
In any case how are you going to fit 50+GB in two (theoretically 24+24 GB) Nvidia cards without swapping to disk when the Mac in question has 64GB (also theoretically) available?
You seem confused. Please feel free to read my post near the top of this very chain of comments, where I specifically compare a Mac Studio to a machine with 6 to 8 Nvidia GPUs. That was the discussion “from the start.”
> In any case how are you going to fit 50+GB in two (theoretically 24+24 GB) Nvidia cards
What seems like a joke about it? And relevant to what, exactly?
The parent of my initial comment in this thread said: "For inference, Apple chips are great due to a high memory bandwidth... It's a cost effective option if you need a lot of memory plus a high bandwidth."
My post was attempting to explain at a high level how 1) Apple SoCs do not really have high memory bandwidth compared to a cluster of GPUs, and 2) you can actually build that cluster of GPUs for the same cost or cheaper than a loaded Mac Studio, and it will drastically outperform the Mac.
If you want specifics on how to build such a GPU cluster, you can search for "ROMED8-2T 3090" for some examples.
Apple's chips have the advantage of being able to be specced out with tons of RAM, but performance isn't going to be in the same ballpark of even fairly old Nvidia chips.