Hacker Newsnew | past | comments | ask | show | jobs | submit | Cane_P's commentslogin

SOCAMM will enter production in the end of the year:

https://www.tomshardware.com/pc-components/dram/nvidia-repor...

The CEO of SK Hynix have confirmed that it is in the works:

https://www.mk.co.kr/en/it/11245259

And big companies have started to analyze it, to figure out how it will fit into their domain:

https://x.com/Jukanlosreve/status/1892916771692421228?t=3ikB...


It clearly have two ports. Just watch on the right side of the picture:

https://www.storagereview.com/wp-content/uploads/2025/01/Sto...

You will however get half of the bandwidth and a lot more latency if you have to go through multiple systems.


DIGITS isn't that impressive... It is a RTX 5070 Ti laptop GPU (992 TOPS, clocked less than 1% higher, to reach 1000 TOPS/1 PFLOP. As a reference RTX 5090 desktop have 3352 TOPS, more than 3x...), with 128 GB of unified memory.

Just because Jensen calls it a super computer and gives it a DGX-1 design, doesn't make it one.

In the Cleo Abram interview [1], Jensen said that DIGITS is 6 times more powerful than the first DGX-1.

According to this PDF [2], DGX-1 had 170 TFLOPS of FP16 (half precision). 170x6=1020 TFLOP (~1 PFLOP). Yes DIGITS is suppose to have 1 PFLOP, but according to the presentation, it should be in FP4...

He also said that it will draw 10k times less power. But DGX-1 had a TDP of 3.5kW [3] and I highly doubt DIGITS will draw 3500/10000=0.35W... the GPU alone will have a peak TDP that is more like 200 times higher than that.

I mean, we all know that NVIDIA does fudge the numbers in charts. Like comparing FP8 from last generation, to FP4 on this. But this is extreme.

Having said that. Do I believe that they can deliver a laptop (in another form factor) and it will perform 1 PFLOP of FP4. Of course! Like I said, it is nothing special. Both Apple and AMD have unified memory in relatively cheap systems.

1. https://youtu.be/7ARBJQn6QkM

2. http://images.nvidia.com/content/technologies/deep-learning/...

3. https://images.nvidia.com/content/pdf/dgx1-v100-system-archi...


Seeing as it is going to deliver 1 PFLOP, it will need to have similar speed as the "native" (GDDR) counterpart otherwise it will only be able to hit that performance as long as all data is in the cache...

My guess is that they will use the RTX 5070 Ti laptop version (992 TFLOPS, slightly higher clocked to reach 1000 TFLOPS/ 1 PFLOP).

Their big GB200 chips have 546 GB/s to their LPDDR memory, they could use the same memory controler on the GB10. They don't need to design a new one. It would still be slower than what they are currently using on the RTX 5070 Ti laptop GPU, but any slower than that, and there is no chance that they could argue that it would hit anywhere near 1 PFLOP of FP4. It would only be possible in extreme edge case scenarios when all data will fit in it's 40MB L2 cache.


I think you have the reasoning backwards, there's no "must" here. Historically there are lots and lots of systems which have struggled to approach their peak FLOPS in real-world apps due to off-chip bottlenecks.


They won't have different models, in any other ways than if you want more storage (up to 4 TB, we don't know the lowest they will sell) and cabling necessary for connecting two DIGITS (it won't be included in the box).

We already know that it is going to be one single CPU and GPU and fixed memory. The GPU is most likely the RTX 5070 Ti laptop model (992 TFLOPS, clocked 1% higher to get 1 PFLOP).


You can connect two, and get 256 GB. But it will still not be enough to run it in native format. You will still need to use lower quant.


CPU (20 ARM cores), GPU (1 PFLOP of FP4) and memory (128 GB) seems fixed, so the only configurable parts would be storage (up to 4TB) and cabling (if you want to connect two DIGITS).

We kind of know what storage cost in a store and we know that Apple (Mac computers) and every phone manufacturer adds a ton of cost for a small increase. NVIDIA will probably do the same.

I have no idea what the cost for their cabling would be, but they exist in 100G, 200G, 400G and 800G speeds and you seem to need two of them.

If you are only going to use one DIGITS, and you can make do with whatever is the smallest storage option, then it is $3000. Many people might have another computer (set up FTP/SMB or similar solution), NAS or USB thumbdrive/external hardrive where they can stor extra data, and in that case you can have more storage without paying for more.


If you read their documentation, then you see that what they are referring to, is when they run 30k instances in parallel.

"Now, let’s turn off the viewer, and change batch size to 30000 (consider using a smaller one if your GPU has a relatively small vram): ...

Running the above script on a desktop with RTX 4090 and 14900K gives you a futuristic simulation speed – over 43 million frames per second, this is 430,000 faster than real-time. Enjoy!"

https://genesis-world.readthedocs.io/en/latest/user_guide/ge...


1k calorie (often just called a Calorie) is the amount of energy needed to raise 1 litre of water by 1 degree Celsius.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: