Author here, when I tried to understand diffusion models I realized that the cod...

godelski · on March 11, 2024

As a researcher, there's a lot of diffusion blogs that I do not like. But I actually really do like this one! It does a great job at getting to the meat of things, showing some of the complexities (often missing) but without getting lost or distracted. I especially like the discussion of trajectories as this is motivating to many things I think a lot of people struggle with (e.g. with schedulers). That can be hard to write. Albeit not as complete, I think this is far more approachable than the Song or Lilian's blogs (great resources too, but wrong audience). Great job! I'm actually going to recommend this to others.

FWIW, a friend of mine (and not "a friend of mine") wrote this minimal diffusion awhile back that I've found useful that's a bit more "full" w.r.t DDPM. More dropping here since I saw others excited to see code and it could provide good synergy here: https://github.com/VSehwag/minimal-diffusion/

ycy · on March 12, 2024

Glad you liked it! Just curious, what else would you like to see to make the post more complete? I had focused on getting through the basics of diffusion training and sampling as concisely as possible, but perhaps in future posts I can delve deeper into other aspects of diffusion.

godelski · on March 12, 2024

That's actually hard for me to answer. I'm afraid to taint it. Honestly, I'd be wary to touch it especially being advice coming from someone already well versed in the domain. It's too easy to accidentally complexify and then you end up losing your target audience. But maybe the connection to VAEs could be made clearer? Maybe a bit clearer or better introducing the topic of optimization? But again, it's a hard line to walk because you've successfully written something that is approachable to a novice. This is the same problem I suffer when trying to teach haha. But I think the fact that you're asking and looking to improve it's the biggest sign you'll continue to do well.

For later topics I think there's a lack on discussions about things like score modeling, the difficulties of understanding latent representations (so many misunderstand the curse of dimensionality), and Schrodinger Bridges. There's so much lol

The better advice I can give is to listen to anyone complaining. It's hard because you have to try to see beyond their words and find their intent (I think this also makes them more digestible and less emotionally draining/hurtful). Especially consider a novice isn't going to have to right words to correctly express their difficulties parsing and it may just come out looking like anger when it's frustration.

Just keep doing your best :)

DinaCoder98 · on March 12, 2024

> FWIW, a friend of mine (and not "a friend of mine")

What? Do you mean a coworker or something as compared to a friend?

lucubratory · on March 12, 2024

They are clarifying that "a friend of mine" is not a euphemism for themselves, because it's more common to use it as a euphemism for yourself than it is to actually talk to strangers online about your friend's life, problems, opinions etc.

DinaCoder98 · on March 12, 2024

Oh, huh. I now recognize what you're referring to but I never would have realized that without being explicitly told so. Thank you.

eutropia · on March 12, 2024

In your example images at the very end, the momentum term seems to have a deleterious effect on the digital painting of the house (the door is missing in the gamma = 2.0 image). I would like to know more details of that example to build an intuition for the effect of your gradient-informed ddim sampler.

As someone who's spent a little time experimenting with sampling procedures on stable diffusion, I was also hoping for a comparison to DDIM in terms of convergence time/steps. Is there a relationship between momentum, convergence and error? (i.e, your momentum sampler at 16 steps is ~equivalent to ddim at 20 steps ± %err_term)

thanks for the excellent post

ycy · on March 12, 2024

It is hard to quantify the performance of samplers on individual images, but we have quantitative experiments on pretrained models in our paper (https://arxiv.org/pdf/2306.04848.pdf). Table 2 has a comparison to other samplers and Figure 9 plots the effects of the momentum term on convergence.

thomasahle · on March 11, 2024

Your `get_sigma_embeds(batches, sigma)` seems to not use its first input? Did you mean to broadcast sigma to shape (batches, 1)?

ycy · on March 11, 2024

My intention was to omit the details of batching in the blog post for clarity of exposition, and I'll update the post accordingly. Sorry for the confusion, but you can see the full implementation here: https://github.com/yuanchenyang/smalldiffusion/blob/main/src...

3abiton · on March 12, 2024

I am curious if any of these concepts derive somehow from some physics principles? Like the same way neural networks are modeled after biological neural networks? Maybe you have some insights on that conception?

xchip · on March 11, 2024

Your post is awesome and is explaining something nobody else did, thanks!

ks2048 · on March 12, 2024

This looks great! How long does it take (on what hardware) to train the toy models? Such as the `fashion_mnist.py` example? Thanks.

ycy · on March 12, 2024

The 2d toy models such as the swissroll takes 2 mins to train on the CPU, whereas the fashionMNIST model takes a couple hours on any modern GPU