sfphoton's comments

sfphoton · 2025-05-26T08:02:22 1748246542

How can you calculate required VRAM from precision and parameter number?

Havoc · 2025-05-26T08:48:33 1748249313

Realistically you probably just want to look at the file size on huggingface and add ~2 gigs for OS/Firefox tabs and and a bit for context (depends but lets say 1-2)

The direct parm conversion math tends to be much less reliable than one would expect once quants are involved.

e.g.

7B @ Q8 = 7.1gb [0]

30B @ Q8 = 34.6gb [1]

btw you can also roughly estimate expected output speed too if you know the device memory throughput. Noting that this doesn't work for MoEs

Also recently discovered that in CPU mode llama.cpp does memory mapping. For some models it loads less than a quarter into memory.

https://huggingface.co/TheBloke/Llama-2-7B-GGUF/tree/main

https://huggingface.co/TheBloke/LLaMA-30b-GGUF/tree/main

NitpickLawyer · 2025-05-26T08:28:15 1748248095

Rule of thumb is parameter_count * precision. Precision can be anything [32,16,8,4] bits. 32bits is sometimes used in training (although less now I guess), and rarely in inference. For a while now "full" precision is 16bit (fp16, bf16), fp8 is 8bit, int4 is 4bit, and so on. Everything that's not "full" precision is also known as quantised. fp8 is a quantised version of the "full" model.

So quick napkin math can give you the VRAM usage for loading the model. 7b can be ~14GB full, 7GB in fp8 and ~3.5GB in 4bit (AWQ, int4, q4_k_m, etc). But that's just to load the model in VRAM. You also need some available VRAM to run inference, and there are a lot of things to consider there too. You need to be able to run a forward pass on the required context, you can keep a kv cache to speed up inference, you can do multiple sessions in parallel, and so on.

Context length is important to take into account because images take a lot of tokens. So what you could do with a 7b LLM at full precision on a 16GB VRAM GPU might not be possible with a VLM, because the context of your query might not fit into the remaining 2GB.

a_t48 · 2025-05-26T08:05:39 1748246739

A float16 is 2 bytes. 7B * 2 bytes = 14GB. I can't say if that's an accurate number, but that's almost certainly how tonii141 calculated it.

sfphoton · 2025-05-26T08:11:55 1748247115

Oh, so FP16 means FloatingPoint16? I'm glad to learn something today, thanks!

sfphoton · on Nov 22, 2024

Author here: I think another takeaway from this story (besides the importance of supply chain security management) is how crucial a defensive computer security architecture is. In a nuclear facility, the only thing that could have prevented these attacks are network segregation, password policies and similar measures.

By the way, IAEA has great guidance [1] on how to manage computer security in nuclear facilities. If you are interested, I encourage you to read it (or ask me about it).

[1] https://www-ns.iaea.org/downloads/security/security-series-d...

sfphoton · on July 4, 2024

Actually you can fit your needs with Openstreetmap. People regularly upload public GPS traces that you can view - thus, get a rough idea of how popular a path is. But OSM also shows all the paths, so you can avoid beaten paths as well.

sfphoton · on Jan 2, 2024

For Hungary, there is another which shows freight trains and single locomotives, too: https://iemig.mav-trakcio.hu/

sfphoton · on Oct 21, 2023

No, Firefox doesn't technically run anything on your machine. It is only that some scripts can write shell code into the "middle click buffer" which the user can unintentionally execute later.