How is that diffusion? What noise is it denoising there?

PartiallyTyped · on March 15, 2025

Don't think about defusion as denoising, but rather as learning a delta operator, or rather, the inverse of one. I don't know how diffusion language models work precisely, but if I were to haphazard a guess, I would say you may think of a sentence as a matrix of values, the operator as simply filling a line with zeros, and the inverse — what the model learns — as adding it back.

This is equivalent to cutting an image in blocks, and learning how to generate incrementally images by inpainting missing blocks. This in-painting mind you can be generated in multiple steps, where you incrementally add more into the block.

simne · on March 14, 2025

Example, as I understand:

(I'm not sure how should look prompt, my guess): Prompt: answer, what word is missing in text query. Query: What is it denoising there?