Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It would not be trivial to do.

GDDR achieves higher speeds than normal DDR mainly by specifying much tighter tolerances on the electrical interface, and using wider interface to the memory chips. This means that using commodity GDDR (which is the only fast DRAM that will be reasonably cheap), you have fairly strict limitations on the maximum amount of RAM your can use with the same GPUs that are manufactured for consumer use. (Typically, at most 4x difference between the lowest-end reasonable configuration and the highest-end one, 2x from higher density modules and 2x from using clamshell memory configuration, although often you only have one type of module for a new memory interface generation.)

If the product requires either a new memory or GPU die configuration, it's cost will be very high.

The only type of memory that can support very different VRAM sizes for an efficiently utilized bus of the same size is HBM, and so far that is limited to the very high end.



Anandtech has an article on the GDDR6X variant[1] that NVIDIA has in their 3000-cards, where they use a more complex encoding to transmit two bits per clock edge.

I hadn't realized just how insane the bandwidth on the higher-ends cards are, the 3090 being just shy of 1 TB/s, yes, one terrabyte per second...

For comparison a couple of DDR5 sticks[2] will just get you north of 70GB/s...

[1]: https://www.anandtech.com/show/15978/micron-spills-on-gddr6x...

[2]: https://www.anandtech.com/show/17269/ddr5-demystified-feat-s...


The accelerators aimed squarely at datacenter use rather than gaming are even more ridiculous, Nvidias H100 has 80GB of memory running at 3.35TB/s.


Do you happen to know where Apple's integrated approach falls on this spectrum?

I was actually wondering about this the other day. A fully maxed out Mac Studio is about $6K, and it comes with a "64-core GPU" and "128GB integrated memory" (whatever any of that means). Would that be enough to run a decent Llama?


It's certainly enough to run a decent Llama, but hardly the most cost-effective. Apple's approach falls between the low-bandwith Intel/AMD laptops and the high-bandwith PCIe HPC components. In a way it's trapped between two markets - ultra-cheap Android/Windows hardware with 4-8gb of RAM that can still do AI inferencing, and ultra-expensive GPGPU setups that are designed to melt these workloads.

The genial thing to say is that it performs very favorable against other consumer inferencing hardware. The numbers get ugly fast once you start throwing money at the problem, though.


The Mac's "integrated memory" means it's shared between the CPU and GPU. So the GPU can address all of that and you can load giant (by current consumer GPU standards) models. I have no idea how it actually performs though.


Well yeah I guess binned cards come into play, cheaper binned cards have a narrower bus. It seems there are quite a few models that aren't too heavy on compute but require a tonne of vram.

It would be nice for Nvidia to release a chip targeted for medium compute/high memory, the lower binning of which should revolve around their max 384b bus on the 4090. But then, it would be hard to financially justify it on their end I suppose.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: