More

photon_lines · 2026-03-25T14:21:47 1774448507

The whole goal of quantisation is to put the data into 'bins' so that it can easily be 'packed' so that you can represent it using less bits (less information). You can think of it like rounding essentially (3.14159 -> 3). Now, sometimes within data, the distribution will be non-ideal for separating it out into bins (let's say that our rounding rules are simple -- we simply use a floor function so 2.45 maps to 2 and 6.4543 maps to 6 etc...) and our bins simply map to the floor -- if we had a set of numbers which look like this: [3.11, 4.43, 5.78, 12.33, 34.32], they would simply map to [3, 4, 5, 12, 34]. Now, we have one huge outlier in our data (34) so to create bins for those sets of numbers, we would need 6 bits of information (2 to the power of 6 = 64), but this is mostly due to the fact that we have one huge outlier (34.32). To get rid of this -- the algorithms applies a random rotation matrix which 'distorts' the original data so that it is more evenly distributed among the possible bins which are assigned to the data set. In linear algebra, a rotation matrix is an orthogonal matrix. When you multiply your vector by this matrix, you aren't changing the "amount" of data (the length of the vector remains the same), but you are recalculating every single number in that vector as a weighted sum of the originals. According to the Central Limit Theorem, when you sum up many random things, the result always starts looking like a bell curve. This is the magic TurboQuant relies on: they don't know what your data looks like, but they know that after the rotation, the data must look like a Beta Distribution and they use this fact to transform the original data into a more 'tightly packed' distribution which allows them to more efficiently pack (or quantise) the information. If most of the transformed data is huddled together into a predictable Bell curve shape, you can pack your bins tightly around that shape leading to much higher precision with fewer needed bits to store it. For example, after applying a rotation matrix, our original transform [3.11, 4.43, 5.78, 12.33, 34.32] might get mapped to something like [8.12, 8.65, 9.25, 10.53, 12.86] and we can crate bins which both are more accurate and need less bits in order to hold our original data set. To create the most optimal bins -- the Lloyd-Max algorithm is used. This algorithm is the gold standard for 1D quantisation. Its goal is to find the best places to put your "boundaries" (where you cut the data) and your "reconstruction values" (the number you store) to minimise the Mean Squared Error (MSE). After applying this, you have your 'rounded' values (or quantized data), but there is still an error value which is missing from our data set: and this is where the residual bit comes in. That bit doesn't represent the original data (or vector) - it simply represents our 'bias' after we apply the above algorithms. It's basically like a '1-bit note' which allows you to perfectly cancel out all the bias terms which our above quantisation algorithm produces to make the 'interactions' (or inner products) when we multiply our values together extremely accurate again even after transforming our original data. Does this make sense?

nico · 2026-03-25T17:03:33 1774458213

Amazing explanation! Thank you so much for taking the time to put it together. It makes a lot of sense. I’m not the one who asked the question, but I was impressed by such eloquent and clearly explained answer

photon_lines · 2026-03-26T15:35:15 1774539315

Thank you! I'm glad you found it helpful (and that others did too)!!

thrtythreeforty · 2026-03-27T15:14:49 1774624489

This is a fantastic explanation. Thank you. The only part I am not following is how it is guaranteed that 1 bit is sufficient for the error value. Is this something the Lloyd-Max algorithm is responsible for ensuring? (Seems to me that if your quantization algorithm is crappy enough, you could need a large number of bits to store the error.)

rtrgrd · 2026-03-26T01:46:25 1774489585

Added to my non-llm username list :)

Thanks so much for the explanation

psidium · 2026-03-29T09:01:11 1774774871

Wow, thank you for the explanation. Such a complex topic and yet you’ve made it simple to understand.

functional_dev · 2026-03-29T10:21:37 1774779697

i wonder what is the limit of quantization when it starts to destroy the logic of weights?

gavinray · 2026-03-25T18:08:20 1774462100

I had to read this over a few times to piece it together, thanks for the thorough and digestable explanation!

rohansood15 · 2026-03-25T16:32:11 1774456331

Thank you.

photon_lines · 2026-02-17T23:25:51 1771370751

Thank you Chase -- I was an early Watsi supporter (and still am actually) but you just reminded me I need to donate soon haha :) Either way fantastic work and thank you!

chaseadam17 · 2026-02-17T23:30:46 1771371046

Thank you so much!

photon_lines · 2026-01-15T02:51:48 1768445508

I don't have a personal site at the moment, but I do have a blog: https://photonlines.substack.com/

Some of my projects: https://github.com/photonlines

photon_lines · 2026-01-15T02:50:06 1768445406

I love your website. Very clear and to the point.

photon_lines · 2026-01-06T22:08:13 1767737293

You do understand that you can run these tools locally right? The code is fully available and open source: https://github.com/blgardner/prism.tools

photon_lines · 2026-01-06T15:40:32 1767714032

I love what you have here. Thank you for open sourcing your work -- but why the custom license? Why not just do a standard MIT license?

BLGardner · 2026-01-06T18:40:06 1767724806

I believe the MIT license leaves it open for others to host the files for public access. I’d like them to be hosted in one place but others are free to host for their own or business use and/or use them locally. It’s just a way for me to protect my rights as the owner/creator

RobotToaster · 2026-01-06T19:03:10 1767726190

You could use the AGPL, anyone who hosted it would then need to share the source code of any modifications they've made.

BLGardner · 2026-01-07T01:27:27 1767749247

All are invited to modify the code to suit their needs, but not provide it to the public. If they want to serve it on a local network that is fine. These tools were made to be freely available to all and to modify for their own personal/business needs. If the need arises the latest version will be on Github.

photon_lines · 2026-01-06T22:04:45 1767737085

Ahhh OK - that makes sense - thank you.

photon_lines · 2025-12-08T15:55:10 1765209310

If anyone is curious to see what Watson actually was you can find it here (it was nowhere near to a generalized large langue model -- mostly made for winning in Jeopardy): https://www.cs.cornell.edu/courses/cs4740/2011sp/papers/AIMa...

photon_lines · 2025-12-08T15:21:26 1765207286

Why in the world would economists need to study this? It's been known that large bureaucracies have been dysfunctional for over a couple of decades now if not centuries. The large reason is because 1) the incentives to do great work are not there (most of the credit for a huge company's success goes to the CEO who gets 100X the salary of a regular worker while delivering usually pretty much nothing) 2) politics usually plays a huge role which gives a huge advantage to your competition (i.e. your competition needs to spend less time on politics and more time on the actual product) and 3) human beings don't functionally work well in groups larger than 100-250 due to the overwhelming complexity of the communication needed in order to make this type of structure work. Incentives though I think are the primary driver - most people at companies like IBM don't have any incentives to actually care about the product they produce and that's the secret behind the ruin of almost every large company.

Edit: you also seem to be giving too much credence to Watson. Watson was actually mostly a marketing tool designed to win in Jeopardy and nothing else. It was constructed specifically to compete in that use-case and was nowhere near to the architecture of a general transformer which is capable of figuring out meta-patterns within language and structurally understanding language. You can read about Watson's design and architecture here if you're curious: https://www.cs.cornell.edu/courses/cs4740/2011sp/papers/AIMa...

johnnyanmac · 2025-12-08T22:16:36 1765232196

More like we need psychologists to ask "why are companies still working with IBM's efficiencies 30 years after its peak?" The workers don't have to care but the businesses dealing with IBM should.

fruitplants · 2025-12-09T03:52:33 1765252353

I may be wrong but I think it's mostly for things like enterprise support in case something goes wrong. IBM has had a large footprint in enterprises (WebSphere MQ, etc). People don't want disruptions in case your own kafka cluster with in-house engineers accountable for everything. So having enterprise support for product/ infra gives a sense of safety. At times rightly so. Depends on a lot of factors- risk appetite, capabilities of in-house engineers, what's at stake, and mostly psychological safety, etc.

shadow28 · 2025-12-08T18:23:52 1765218232

> most of the credit for a huge company's success goes to the CEO who gets 100X the salary of a regular worker while delivering usually pretty much nothing

Well, in Confluent's case I'm not so sure that's true given that their CEO is also the company founder as well as one of the original authors of Apache Kafka.

jimbokun · 2025-12-08T20:12:32 1765224752

Not Confluent, IBM.

photon_lines · 2025-11-27T20:24:26 1764275066

Exciting stuff from a fantastic team.

photon_lines · 2025-11-27T20:10:07 1764274207

Happy Thanksgiving everyone -- I've mostly been a lurker here over the last 20 years and I'm thankful for being able to interact with such a bright and vibrant community full of thinkers, doers and explorers -- you guys definitely changed my life for the better and inspired me in many, many ways.