The underlying NSD dataset used in the three prominent (and impressive) recent papers on this topic (including the one linked here) is a bit problematic as it invites this (classification/identification, not reconstruction): It only has 80 categories. It has not been recorded with reconstruction in mind.
Reconstruction is the primary and difficult aim, but is what you want and expect when people talk such „mind reading”. Classifying something on brain activity has long been solved and is not difficult, it is almost trivial with modern data sizes and quality. At 80 categories and with data from higher visual areas you could even use an SVM for the basic classifier and then some method for getting a similar blob shape from the activity (V1-V3 are map-like), and get good results.
If you are ignorant about the question whether you are just doing classification you can easily get too-good-to-be-true results. With these newer methods relying on pretrained features this classification case can hide deep inside the model too, and can easily be missed.
One thing they showed is that the 80 categories of that data collapse to just 40 clusters in the semantic space.
(Kamitani has been working on the reconstruction question for long time and knows all these traps quite well.)
The deeprecon dataset proposed as an alternative has been around for a few years and been used in multiple reconstruction papers. It has many more classes, out of distribution „abstract“ images and no class overlap between train and test images, so it’s quite suitable for proving that it is actually reconstruction. But it’s also one order of magnitude smaller than the NSD data used for the newer reconstruction studies. If you modify the 80-class NSD data to not have train-test class overlap, the two diffusion methods tested there do not work as well, but still look like they do some reconstruction.
On deeprecon the two tested diffusion methods fail at reconstructing the abstract OOD images (which NSD does not have), something previous reconstruction methods could do.
Reconstruction is the primary and difficult aim, but is what you want and expect when people talk such „mind reading”. Classifying something on brain activity has long been solved and is not difficult, it is almost trivial with modern data sizes and quality. At 80 categories and with data from higher visual areas you could even use an SVM for the basic classifier and then some method for getting a similar blob shape from the activity (V1-V3 are map-like), and get good results.
If you are ignorant about the question whether you are just doing classification you can easily get too-good-to-be-true results. With these newer methods relying on pretrained features this classification case can hide deep inside the model too, and can easily be missed.
The community is currently discussing to what extent this applies to these newer papers (start with original post): https://twitter.com/ykamit/status/1677872648590864385?s=20
One thing they showed is that the 80 categories of that data collapse to just 40 clusters in the semantic space.
(Kamitani has been working on the reconstruction question for long time and knows all these traps quite well.)
The deeprecon dataset proposed as an alternative has been around for a few years and been used in multiple reconstruction papers. It has many more classes, out of distribution „abstract“ images and no class overlap between train and test images, so it’s quite suitable for proving that it is actually reconstruction. But it’s also one order of magnitude smaller than the NSD data used for the newer reconstruction studies. If you modify the 80-class NSD data to not have train-test class overlap, the two diffusion methods tested there do not work as well, but still look like they do some reconstruction.
On deeprecon the two tested diffusion methods fail at reconstructing the abstract OOD images (which NSD does not have), something previous reconstruction methods could do.