Don't think about defusion as denoising, but rather as learning a delta operator, or rather, the inverse of one. I don't know how diffusion language models work precisely, but if I were to haphazard a guess, I would say you may think of a sentence as a matrix of values, the operator as simply filling a line with zeros, and the inverse — what the model learns — as adding it back.
This is equivalent to cutting an image in blocks, and learning how to generate incrementally images by inpainting missing blocks. This in-painting mind you can be generated in multiple steps, where you incrementally add more into the block.