The fine article shows the low-res input and high-res output photos, but conspicuously does not show a high-res original from whence the low-res input was derived.
Without comparing a high-res original photograph to the high-res output photograph, we do not know if this fine technique is capable of producing nice-looking high-res imagery, or if it is capable of reproducing how an image of the subject would have looked like had it been taken in higher resolution.
In other words, does the output of the technique match the actual object in the photograph?
That is indeed a shortcoming of this article in my opinion as well. If you want a comparison to the original high res photos, there are some examples in the original paper for SR3: https://arxiv.org/pdf/2104.07636.pdf
Have not had a look at the CDM paper.
Fig. 9 of this paper is really interesting if you zoom in. It looks like if the model was not trained on the appropriate class label, it just goes completely off the rails. As previous commenters have noted, I would be very hesitant to use this for anything analytical, or where you are looking for something unexpected. For faces though, this is amazing.
There is an app called "face app" or whatever that already did pretty good job upscaling people's face using state-of-the-art AI upscaling.
The result is impressive, but the moment you started to use it on someone you're actually familiar, it becomes weird very quickly for the obvious reasons. The teech, for example, are never right.
This kind of "believable but not truthful" results are rampant in all these machine-learning based tools. It's not very harmful in case of upscaling a few photos I guess, but I've been bitten by it in an acclaimed translation service called DeepL. I use it to translate Japanese to English frequently, and have found that it often (nontrivially) made up sentences that don't exist in the original paragraphs, sometimes have the opposite meanings, or totally ignore part of the text to make the result "more fluent". And unlike traditional translation tools, they are very hard to notice if you know nothing about the original language. I have to from time to time use some more "primitive" translation tools, and compare the results side by side, to avoid such issues. It's frustrating.
I can recall seeing a conspiracy get traction on Twitter a few months back, where it was claimed that a photograph of a famous person was actually a body double. Someone used an ML upscaler to "enhance" the image, and their followers began scrutinizing the result: "The teeth are different!", "The nose shape is wrong!", "It's not the same person!"
> Not to mention what happens when some good-intending but ill-informed agency release an enhanced photo of a suspect on the run.
The more interesting situation is when such "enhanced photos" are used in prosecutions. I suspect that ML "forensics" techniques and questionable expert witnesses will be used to elevate hunches to guilty verdicts and raise (false) conviction rates.
I think it would be possible by any defense attorney to show how bad a job it does at reproducing the original by showing hi rez images (actual) vs AI enhance upscales. Shouldn't be hard to demonstrate "reasonable doubt"
Sure, if you are evaluating at whether the claims are accurate on that level of precision, similarity may not be relevant.
However for the rest of us, evaluating whether the work is relevant in the surrounding world really is important. And for that, resemblance to originals is critical in some use cases. I.e. we're evaluating a fit for specific use cases.
If this AI can't produce upscaled photos that are close to original photos, the application of this tech is severely limited. There's not going to be CSI like "enhance" moment in real life like the article claims.
No, it doesn't. A piece of code n bits long can only have a maximum of n bits of information in it; you can't get more than n bits of information out of it. It can't create information out of nothing.
I'm not sure what you mean, but that is pretty much what generative neural systems do. If by information you mean data that correspond to reality? Than sure.
I guess you can say that the data is a correlated to reality by a certain probability, and the better the model the higher the probability?
> You can't squeeze n+1 bits out of n bits of information.
Sure, in the extractive sense. But in the generative sense you can.
The fact that the extra information is 'synthetic' and not 'natural' doesn't mean that it isn't extra info, just that it may or may not (probably not) correspond with ground truth.
Another way to think of it is that super-resolution is effectively a (possibly benign) man-in-the-middle. If what you're concerned about is the information flowing from Alice to Bob, Eve isn't adding any info, and may in fact be drowning the signal in more noise. But you can also see it as Eve communicating more info to Bob than Alice is to Eve. Whether what Eve is adding should be considered noise or signal is highly context dependent.
> A deterministic algorithm can't generate information where none exists.
Right. All the processes being used for these purposes are stochastic.
Edit: Actually they are deterministic, in the same way a pseudo-random-number-generator is, and typically rely on a 'seed' that would have to come from a random source to be non-deterministic, and doesn't, so users get to have pseudo-random but reproducible results. But that's really getting into the weeds.
Very simple, you _predict_. If you can tell the low res image is of a person's face, the likelihood of possibilities when filling in the blanks in the high res image are significantly reduced.
It seems to me that it's basically like me (a non artist) drawing something I saw and handing it to a good artist and asking them to draw that but better.
They aren't drawing what I saw, but they are drawing a better representation, so it can satisfy my need to see the thing in physical form, but it can never be a real replacement.
If you ever have something that you would be happy to substitute a very good painting for a blurry image then this is good. If you need to know what something actually looked like in high def (license plate numbers, micro tumors) this is useless, or worse than useless if it ever gets admitted in court.
Not entirely true. The model can extract image information from the pixels a human might not be able to see. Like how you can enhance the colors in a video of a face in a way that it pulses red with your heartbeat. The information about your heartbeat was there all along, our eyes were just not able to extract / recognize it.
No, it does not provide sci-fi abilities to "enhance" resolution end extract new details.
Because those details are generated by AI.
For example, the woman in the photo might have different teeth in reality. We can't learn anything about her teeth because the teeth in the generated photo are one of many possible solutions that match the input.
Actually, the photo now has less information for practical purpose as you don't know which details are real and which have been manufactured.
So about the only gain is to improve the photo for aesthetic reasons.
I disagree. If you need to know an estimate or guess of certain details that aren't visible in lo-res, this is very useful because the AI is likely much better at inferring these details than a human.
Sure, it is still a guess but a better one than humans can make.
It knows people have teeth, but the teeth are only estimated, not real teeth. Maybe they have been copied from some other face. The point is, it is not her teeth.
Imagine it was low res car photo with unreadable plates. "Enhancing" them this way would not bring back the plates. It could paste some plates, because we know this part of car usually has plates, but the real information (actual registration number) is already lost and can't be brought back this way.
how do you know the registration number is actually lost? Surely if it's downscaled enough it will be lost, but there's probably a point at which the data is not lost to AI, yet unrecoverable for humans.
I see you don't understand. Either it is or it is not lost. By lost I mean unrecoverable with any method because there is not enough information to recover from.
In any photo some detail has been lost. This is trivially proven, as the amount of detail in the actual scene was many, many orders of magnitude more than in the photo file.
Detail that has been lost cannot be recovered, it can only be replaced with something that makes contextual sense which is what AI is doing here.
This can be dangerous. A lot of medical imaging deliberately avoids using any kind of lossy compression due to worries about artifacts in the image. Actually adding new pixels that are not in the raw image seems especially worrying.
I worry such funny algorithms find their way into hardware and start causing chaos in science and engineering. People do rely on COTS measuring equipment for a lot of important work, and there's a tacit assumption that the equipment tries to reflect reality.
I've mentioned this before[0], so quoting myself:
"for example, a research team may decide to not spend money on expensive scientific cameras for monitoring experiment, and instead opt to buy an expensive - but still much cheaper - DSLR sold to photographers, or strap a couple of iPhones 15 they found in the drawer (it's the future, they're all using iPhones 17, which is two generations behind the newest one). That's using COTS equipment. COTS is typically sold to less sophisticated users, but is often useful for less sophisticated needs of more sophisticated users too. But if COTS cameras start to accrue built-in algorithms that literally fake data, it may be a while before such researchers realize they're looking at photos where most of the pixels don't correspond to observable reality, in a complicated way they didn't expect."
Bit like how people post pictures taken with their phones, using a 'nofilter' hashtag, entirely unaware that the phone software has applied a filter of some sort automagically. They don't post 'raw' images
Sadly, pharma companies & hospitals will probably prefer these types of images being used: "Oh more likely than not, there is something there - let's start you on this long-term course of expensive medication!".
This is probably an example that the writer came up with. I'm very sure that the people who work on this are well aware that the details it fills in may not match reality.
Right. This technology is very dangerous if used to compress & then 'uncompress' medical images. I used to be a bit more cautious but I think if the model was specifically trained on x-rays or some type of medical images, it could do a very good job. I think the original image should always be shown in addition to the AI upscaled image. Having both the original plus a AI upscaled image that is 'correct' 90% of the time could be very useful.
When it comes to things like distinguishing a shadow on a scan, I think AI might actually be better 'detecting' whether something is a real shadow or just very similar to a shadow. I think it's just one of those things where AI up-scaling improves stuff ~80% of the time but is worse the other ~20%. The fundamental issue may become the same with self driving cars; people trust the AI too much and become inattentive themselves.
While you certainly can't add 'correct' information that doesn't already exist in an image, the upscaling could correctly make existing information more obvious. Assuming that the human brain functions pretty much like AI (or rather the opposite) then at some point AI will become as competent which means that eventually with enough training & tweaking it should be as good or better than having a second human perspective.
> Right. This technology is very dangerous if used to compress & then 'uncompress' medical images.
It could actually be useful for compression as a predictor, but only if you also store the residual so that the true original image can be reconstructed.
The original paper[1] itself accidentally demonstrates how dangerous this will be. Look at the picture of the leopard and study the patter of the spots around the face. The pattern on the upscaled image is clearly different from the original. The algorithm has just generated 'realistic' looking spots where it thinks there should be spots, and they have no relation to reality.
> This would depends on the false positive / false negative rate.
I'm not a doctor, but I am a physicist and former pro-photographer, what is noise and what is signal in an experiment whose output is an 'image' has nothing to do with what makes a photo look good to human eyes. Often the whole point of methods of visualization is to make the image look objectively bad so you can easily pick out the areas of interest by the fact they are an eyesore. Applying upscaling to those images will actively destroy vital data.
In depends on the way it is used. If it's used knowingly and as a last resort effort just to make sure that nothing is there then I don't see the problem.
Not sure what you mean, but if you're envisionign something like 'doctor looking at original, doesn't think there's anything wrong, but then checks upscaled image as well, just to be sure' then that is very dangerous, as it can lead to a significant increase in unnecessary testing.
It may not be exactly as dangerous as the opposite (doctor looks at image thinks there is something suspicious, checks upscaled image to see if it's there as well), but it's still very dangerous.
Thank you very much for posting the original. I agree with some of the other comments, while the generated faces look highly photo-realistic, in some cases they also look quite different from the actual person.
Not that different mind you, but humans are obviously super sensitive to tiny changes on faces - it doesn't take that much to make it look like a different person altogether. For the non-face images it was much harder for me to really detect many differences, and they certainly didn't bother me.
Thank you for this link. The differences between the original and the upscaled SR3 model are highlighted mostly for me on the picture of the leopard. The facial markings are clearly different.
This is less upscaling and more using a seed to generate a believable high-res image. Which is interesting in and of itself, but I find myself mostly wondering how much variation you can get from the same starting seed.
Hard to point out the difference, but I feel it too.
It's adding information either way, I agree. The difference is that the old algos used information from the image itself, and this one uses information from a lot of other images.
1. Maybe I'm guilty of moving goalposts but super-resolution of faces isn't that 'Jaw-Dropping' after the recent GAN work that showed that you can create hyper-realistic synthetic faces from 0 input to guide it.
2. There are certain portions of the image that clearly do not contain enough resolution to be reconstructed satisfactorily. E.g. teeth, skin imperfections. I wonder how well a person would react if their teeth were either messed up or "fixed" by "the AI".
I found the compounding errors quite interesting, especially with the dog. The pixel changes originally caused by diffraction of light around the edges became a quite distorted skull shape with a rounded muzzle that resembled a poorly-done taxidermy job. The original photo of the line of teeth with a single dark spot is transformed into a bizarre serpentine line of teeth that would never exist in real life.
Wow, this is basically deconvolution. Can't wait to hear this applied to reverby audio. Reverb is basically blurring ('smearing' of sound) in the audio domain.
What's the best commercial or open-source software for photo upscaling these days? It would be so wonderful to breathe new life into very old family photos!
Pixelmator Pro[1] does a pretty good job with its "ML Super Resolution". Apparently Adobe have a similar "Super Resolution"[2]. One of the VQGAN-CLIP notebooks uses ISR[3] (but I haven't managed to get that working locally yet because of weird tensorflow version requirements.)
This would be good to know. Last week I had a job photographing whales with a drone. Usual legal distance is 300m but I had a permit to photograph from 80m. Meanwhile, I suspect the clients would want results that looked even closer. Being able to upscale the waves and whale details might actually work pretty well in software - it just has to look like a whale up close and not necessarily the exact whale photographed.
It will go from comedy to tragedy, as somebody will eventually get arrested and even convicted based on high quality picture of their face upscaled from 16x16 noisy mess of pixels.
So the system is solving a high-res inpainting puzzle that - if filtered - looks similar to a low-res input?
The results are impressive because our brain can‘t do this quickly. They contain absolutely no additional information but they _seem_ to do, so this may lead to much harm.
What happens if you take an image of a portrait painting, reduce the resolution to pixelate at whatever resolution this upscaling model prefers, then run the model?
Will the resulting image appear even more realistic than the painting?
I don't understand what "confusion rate" metric was in the article. Also don't see any comparison with original high-res image so that we can see how true to life the generated images look?
> Also don't see any comparison with original high-res image so that we can see how true to life the generated images look?
Was wondering about that too. It certainly produces realistic-looking high-res images, but especially when the article talks about potential uses ranging "from restoring old family photos to improving medical imaging", it seems like accuracy may be more valued than "looking realistic".
Spot on. This is not "zoom, enhance" — it is "fabricate plausible detail based on a training set". Using it for anything other than making pictures look nice would be disastrous.
Other people in these comments are talking about using it for law enforcement. Train it on a bunch of pictures of black people holding guns and now suddenly it will "reveal" guns in the hands of all black people in blurry CCTV footage. (This specific example is likely a little simplistic to be a problem in reality, but it demonstrates the problem of thinking it actually reveals some hidden detail.)
My understanding is that they present people with two images, the real original high-res image, and the upscaled image, and ask "Which of these is the real image". The extent to which people are confused demonstrates how good the algorithm is. If it was perfect, and there was no difference between the original and upscaled then you'd expect people to pick each image about 50% of the time randomly.
So many of these upscaling technologies feel like the sketch of crime scenes. Artistic, to be sure, but probably not as good for actually filling in details as we'd like. And downright dangerous if they set unreasonably high expectations of fidelity.
When would anyone use this? When they want to super zoom in on something? I think a more useful photo upgrading tech would be trying to un-blur shots and adjust lighting.
This is frustrating that it's not showing the source image a a high DPI, the reduced image, then the result image. Some of the faces seem a bit off, but I imagine they are all pretty far off, but I don't know these people or the images. Still impressive of course.
I'm assuming that the low-res images used were created from the high-res images, which might imply that the process could be reversible if the AI/ML or some algo could learn how to reverse it.
I wonder, if my assumption above is correct, how this would behave if the image was low-res to begin with due to whatever reason. Would it perform at the same level?
It’s probably a tool that could be used for generating a likely appearance for helping to find a person to interrogate, similar to eye witness sketches. I would hope it’s never allowed at trial though.
Prosecutors would need to be able to defend every element they use to gather evidence in court. Nothing downstream from inadmissible evidence can be admissible in court. “Fruit of the poisonous tree.”
No. Actually, that’s not true. If you upscaled a picture of a drug deal and then used that to narrow your search set and then posted a watch on that set to catch them dealing drugs, you’ve got a useful case. Totally admissible.
It is genuinely alarming to me how many people in this thread are saying that this will be a boon for crime investigation, medical imaging or the like.
To spell it out: This is not re-creating what was actually present in the original image. That is, and will always be, impossible due to fundamental limits of information in the source seed. What it is doing is using AI Hallucinations to create a believable-looking fake.
Need to be careful what data the AI model is trained on. i.e. it could bias the features of a particular race, causing those people to be held as suspects more often.
Without comparing a high-res original photograph to the high-res output photograph, we do not know if this fine technique is capable of producing nice-looking high-res imagery, or if it is capable of reproducing how an image of the subject would have looked like had it been taken in higher resolution.
In other words, does the output of the technique match the actual object in the photograph?