A 11 year old dupe, I know. But first time I’ve seen it, and it just added to my admiration for him. And, it’s just as applicable today as it was 24 years ago!
Edit: Sorry, I'm not sure if this is a quant, but it says 'finetuned' from the Google Gemma 4 parent snapshot. It's the same size as the UD 8-bit quant though.
For the best quality reply, I used the Gemma-4 31B UD-Q8_K_XL quant with Unsloth Studio to summarize the URL with web search. It produced 4.9 tok/s (including web search) on an MacBook Pro M1 Max with 64GB.
Here an excerpt of it's own words:
Unsloth Dynamic 2.0 Quantization
Dynamic 2.0 is not just a "bit-reduction" but an intelligent, per-layer optimization strategy.
- Selective Layer Quantization: Instead of making every layer 4-bit, Dynamic 2.0 analyzes every single layer and selectively adjusts the quantization type. Some critical layers may be kept at higher precision, while less critical layers are compressed more.
- Model-Specific Tailoring: The quantization scheme is custom-built for each model. For example, the layers selected for quantization in Gemma 3 are completely different from those in Llama 4.
- High-Quality Calibration: They use a hand-curated calibration dataset of >1.5M tokens specifically designed to enhance conversational chat performance, rather than just optimizing for Wikipedia-style text.
- Architecture Agnostic: While previous versions were mostly effective for MoE (Mixture of Experts) models, Dynamic 2.0 works for all architectures (both MoE and non-MoE).
Wow, I just remembered using AES when I wrote an 'accessory' (menu bar app) that converted bitmap to vector for an ST DTP app that supported both. An early form of plugin I suppose. Pretty ahead of the MS mess at the time.
> As I recall, there were tons of books about GEM for the Atari ST, at least in Europe.
Yes, there were, but compared with the Windows textbooks and Microsoft-supplied documentation for Windows, they were really not good. In the UK, they were translated (not well) from German. At least all the ones I owned were almost completely lacking in examples, and examples are really what you want when learning to use something.
Man I spent hours and hours just last month trying to reverse engineer the original Notator/Creator dongle and get Notator to launch in emulation by patching Hatari to emulate the dongle.
Codex & Gemini & I had something almost working. That dongle was evil and crazy complex. Fairly complex CPLD that depended on system timing and in the end the emulator just can't fulfill whatever contract the software expects from the bus + the emulated dongle.
Is the software still attractive to use, after all those years, or why are you going to these extremes? Sounds it's somehow intimately intertwined with the dongle, if the check routines can't simply be patched.
I have two Falcons here they're a compelling machine and still fun to play with.
But if I were to ask for a machine repro, a new motherboard, it would be in a different PCB form factor to get it into an ITX or m-ITX case. Because it's the cases and keyboards that go, not the machine.
Apple sued DRI, which resulted in the crippling of GEM, the glaring one I remember were static windows. You heard that right, windows were not resizable but had fixed screen locations in the PC version.
Thankfully Atari licensed GEM for their 68000 machines before the lawsuit, and wasn't affected by these changes. The Atari ST (Sixteen/Thirtytwo) was very Mac like at the time. It even ran the Mac OS from Apple ROMs (Spectre 128 and Aladin) on its much cheaper hardware.
When the Mac and Atari ST first hit the market in the 80's, there were Comics created in this 1-bit "ordered-dither" style. For error-diffusion dithering (Floyd-Steinberg etc.), you needed more bits per pixel, to carry the error.
Thank you for the follow up! Big fan of your models here, thanks for everything you are doing!
Works fine on MacOS now (chat only).
On Ubuntu 24.04 with two GPU's (3090+3070), it appears that Llama.cpp sometimes uses the CPU and not GPU. This is judging from the tk/s and CPU load for identical models run with US-studio vs. just Llama.cpp (bleeding edge).
https://news.ycombinator.com/item?id=7567159
reply