It would not be trivial to do. GDDR achieves higher speeds than normal DDR mainl...

magicalhippo · on May 14, 2023

Anandtech has an article on the GDDR6X variant[1] that NVIDIA has in their 3000-cards, where they use a more complex encoding to transmit two bits per clock edge.

I hadn't realized just how insane the bandwidth on the higher-ends cards are, the 3090 being just shy of 1 TB/s, yes, one terrabyte per second...

For comparison a couple of DDR5 sticks[2] will just get you north of 70GB/s...

[1]: https://www.anandtech.com/show/15978/micron-spills-on-gddr6x...

[2]: https://www.anandtech.com/show/17269/ddr5-demystified-feat-s...

jsheard · on May 15, 2023

The accelerators aimed squarely at datacenter use rather than gaming are even more ridiculous, Nvidias H100 has 80GB of memory running at 3.35TB/s.

q7xvh97o2pDhNrh · on May 14, 2023

Do you happen to know where Apple's integrated approach falls on this spectrum?

I was actually wondering about this the other day. A fully maxed out Mac Studio is about $6K, and it comes with a "64-core GPU" and "128GB integrated memory" (whatever any of that means). Would that be enough to run a decent Llama?

smoldesu · on May 15, 2023

It's certainly enough to run a decent Llama, but hardly the most cost-effective. Apple's approach falls between the low-bandwith Intel/AMD laptops and the high-bandwith PCIe HPC components. In a way it's trapped between two markets - ultra-cheap Android/Windows hardware with 4-8gb of RAM that can still do AI inferencing, and ultra-expensive GPGPU setups that are designed to melt these workloads.

The genial thing to say is that it performs very favorable against other consumer inferencing hardware. The numbers get ugly fast once you start throwing money at the problem, though.

cudder · on May 14, 2023

The Mac's "integrated memory" means it's shared between the CPU and GPU. So the GPU can address all of that and you can load giant (by current consumer GPU standards) models. I have no idea how it actually performs though.

fennecfoxy · on May 17, 2023

Well yeah I guess binned cards come into play, cheaper binned cards have a narrower bus. It seems there are quite a few models that aren't too heavy on compute but require a tonne of vram.

It would be nice for Nvidia to release a chip targeted for medium compute/high memory, the lower binning of which should revolve around their max 384b bus on the 4090. But then, it would be hard to financially justify it on their end I suppose.