Yes there was. However this is a different paper, describing a different method,...

RC_ITR · on July 18, 2023

> To achieve the goals of retrieval and reconstruction with a single model trained end-to-end, we adopt a novel approach of using two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior).

What you can think of contrastive learning as is: two separate models that take different inputs and make vectors of the same length as outputs. This is achieved by training both models on pairs of training data (in this case fMRI images and observed images).

What the LAION-5B work shows is that they did a good enough job of this training that the models are really good at creating similar vectors for nearly any image and fMRI pair.

Then, they make a prior model which basically says “our fMRI vectors are essentially image vectors with an arbitrary amount of randomness in them (representing the difference between the contrastive learning models). Let’s train a model to learn to remove that randomness, then we have image vectors.”

So yes, this is an impressive result at first glance and not some overfitting trick.

It’s also sort of bread and butter at this point (replace fMRI with “text” and that’s just what Stable Diffusion is).

They’ll be lots of these sort of results coming out soon.

atom_101 · on July 18, 2023

This is mostly correct, except that there is only one model. This model takes an fMRI and predicts 2 outputs. The first is specialized for retrieval and the second can be fed into a diffusion model to reconstruct images.

You can see the comparison in performance between LAION-5B retrieval and actual reconstructions in the paper. When retrieving from a large enough database like LAION-5B, we can get images that are quite similar to the seen images in terms of high level content, but not so similar in low-level details (relative position of objects, colors, texture, etc). Reconstruction with diffusion models does much better in terms of low-level metrics.

RC_ITR · on July 18, 2023

How is contrastive learning done with one model, exactly?

I agree only one is used in inference, but two are needed for training (otherwise how do you calculate a meaningful loss function?). Notice in the original CLIP paper, there's an image encoder and a text encoder, even though only the text encoder is used during inference. [0]

[0] https://arxiv.org/pdf/2103.00020.pdf

atom_101 · on July 18, 2023

There are 2 submodules in our model — a contrastive submodule and a diffusion prior submodule, but they still form 1 model because they are trained end-to-end. In the final architecture that we picked there is a common backbone that maps from fMRIs to an intermediate space. Then there is an MLP projector that produces the retrieval embeddings and a diffusion prior that produces the stable diffusion embeddings.

Both the prior and MLP projector makes use of the same intermediate space, and the backbone + projector + prior are all trained end-to-end (the contrastive loss on the projector output and mse loss on prior outputs are simply added together).

We found that this works better than first training a contrastive model then freezing it and training a diffusion prior on its outputs (similar to CLIP + DALLE-2). That is, the retrieval objective improves reconstruction and the reconstruction objective slightly improves retrieval.

satvikpendem · on July 18, 2023

If it's still retrieving an image and not reconstructing it, if the dataset is large enough that's decently fine, but this is generally not how diffusion models work in general and I'd have expected the model to map the fMRI data to a wholly new image.

jph00 · on July 18, 2023

Please read the paper. Or at least the blog post. It's really quite readable.

They explain that they've done both retrieval and reconstruction, and have lots of pictures showing examples of each.

https://medarc-ai.github.io/mindeye/

RC_ITR · on July 18, 2023

If you can retrieve an image using a latent vector, it’s trivial to reconstruct it (decently well) with a diffusion model.

rocqua · on July 18, 2023

They tested themselves both on retrieval and reconstruction.