This is the thing. I don't understand how the company is going to function once all those folks become multi-millionaires. It is a really odd situation .. I don't know if something quite like it occurred. Even G, M and F had slower slopes I think in terms of stock price. Nvidia wasn't known to have the best engineers (not disrespectful but don't think it was as hard to get in as the other tech companies 4-5 years back). I heard a story that at Microsoft back in the old days, people would wear a badge saying FUIV (FU I'm vested).
When I was a kid, going to a restaurant was a treat and a privilege. If my sister or I misbehaved, we were taken out to the car, and might not get to go to a restaurant again for a while.
I see kids in restaurants these days and mostly find their behavior appalling. And it's sad the best-behaved kids are only quiet because they have an iPad in front of them. (No headphones, of course, so that's another annoyance the rest of us have to put up with.)
When I was a kid the only restaurant I was allowed to go to was Friendly's, and only on my/my siblings birthdays. It was a HUGE treat and I knew to be on my very best behavior because if there was any acting up, even a little, I wouldn't be allowed out to eat out ever again.
Nowadays kids aren't expected to behave in a restaurant, so they don't. It's about expectations.
I've heard about the lotto system but assumed the school district would be obligated to bus kids (i.e. not force them to use public transit alone). The parents have issues with the school bus??
I visited Seattle the city proper recently, and also felt depressed. Not sure why I got that vibe. On paper Seattle is great, and no-doubt, the Pacific north west has good nature and tech-industry. It felt odd why Seattle felt "different".
The Seattle Freeze seems like one of those broad stereotypes, but I experienced it as very real. People are not unfriendly, but they are unsocial. I felt lonelier than any other place I've lived. There are of course many other factors, but I'm not the only one, which is validating.
I'm extremely miffed. Mulling leaving the country at some point in the near future, and that triggers a much higher tax bill (considered a deemed disposition). There is just no way to get ahead in this place.
I did not invest in the stock market properly in the last decade (mid-40s; tech income but just put it to paying off house - estimating this cost be at least 5 million bucks conservatively). Opening a stock account and will slow down paying off the house.
I am confused how all these things are able to interoperate. Are the creators of these models following the same IO for their models? Won't the tokenizer or token embedder be different? I am genuinely confused by how the same code works for so many different models.
It's complicated, but basically because most are llama architecture. Meta all but set the standard for open source llms when they released llama1, and anyone trying to deviate from it has run into trouble because the models don't work with the hyper optimized llama runtumes.
Also, there's a lot of magic going on behind the scenes with configs stored in gguf/huggingface format models, and the libraries that use them. There are different tokenizers, but they mostly follow the same standards.
Can any kind soul explain the difference between GGUF, GGML and all the other model packaging I am seeing these days? Was used to pth and the thing tf uses. Is this all to support inference or quantization? Who manages these formats or are they brewing organically?
I think it's mostly an organic process arising from the ecosystem.
My personal way of understanding it is this - the original sin of model weight format complexity is that NNs are both data and computation.
Representing the computation as data is the hard part and that's where the simplicity falls apart. Do you embed the compute graph? If so, what do you do about different frameworks supporting overlapping but distinct operations. Do you need the artifact to make training reproducible? Well that's an even more complex computation that you have to serialize as data. And so on..
It's all mostly just inference, though some train LoRAs directly on quantized models too.
GGML and GGUF are the same thing, GGUF is the new version that adds more data about the model so it's easy to support multiple architectures, and also includes prompt templates. These can run CPU only, be partially or fully offloaded to a GPU. With K quants, you can get anywhere from a 2 bit to an 8 bit GGUF.
GPTQ was the GPU-only optimized quantization method that was superseded by AWQ, which is roughly 2x faster and now by EXL2 which is even better. These are usually only 4 bit.
Safetensors and pytorch bin files are raw float16 model files, these are only really used for continued fine tuning.
Of the ones I commonly use, I've only seen it read by text-generation-webui, in the GGML days it had a long hardcoded list of known models and which templates they use so they could be auto-selected (which was often wrong), but now it just grabs it from any model directly and sets it when it's loaded.
Err .. the media people take visual quality and aesthetics very, very seriously. The Director has a vision and the tech goes to amazing lengths to support it. It is a different world as the original post said.
I also worked at IBM (Research) early in my career. Most folks I know left but some good folks went into manager/leader positions. I always wonder .. do manager/leaders get paid better at IBM (Research)? Cause the people are really good.