> Therefore, on average, unless the lossless algorithm has a perverse encoding that uses more bits than necessary to encode the possible outputs, no lossless algorithm can beat it on average
I think you mean "unless the lossy algorithm has a perverse encoding."
But your argument proves too much. It's true that no lossless algorithm can beat a lossy algorithm if averaged across all possible images, for precisely the reason you say. But for the same reason, no lossless algorithm can beat the "compression algorithm" of no compression at all, like BMP but without redundancy in the header, averaged across all possible images.
The unknown question — which I think we don't even have a good estimate of — is how much incompressible information is in the images we're interested in compressing. Some of that information is random noise, which lossy algorithms like JPEG can win by discarding. (Better algorithms than JPEG can even improve image quality that way.) But other parts of that information are the actual signal we want to preserve; as demonstrated by Deep Dream, JPEG doesn't have a particularly good representation of that information, but we have reason to believe that on average it is a very small amount of information, much smaller than JPEG files.
If it turns out that the actually random noise is smaller than the delta between the information-theoretic Kolmogorov complexity of the signal in the images we're interested in compressing and the size of typical JPEG files for them, then a lossless algorithm could beat JPEG, as typically applied, on size. Of course, a new lossy algorithm that uses the same model for the probability distribution of images, and simply discards the random noise rather than storing it as a lossless algorithm must do, would do better still.
We are seeing this in practice for audio with LPCNet, where lossy compression is able to produce speech that is more easily comprehensible than the original recording: https://people.xiph.org/~jm/demo/lpcnet_codec/
It seems unlikely to me that the ideal predictor residual, or random noise, in an average photo (among the photos we're interested in, not all possible Library-of-Borges photos) is less than the size of a JPEG of that photo. But we don't have a good estimate of the Kolmogorov complexity of typical photos, so I'm not comfortable making a definite statement that this is impossible.
For example, one unknown component is how random camera sensor noise is. Some of it is pure quantum randomness, but other parts are a question of different gains and offsets (quantum efficiencies × active sizes and dark currents) in different pixels. It might turn out that those gains and offsets are somewhat compressible because semiconductor fabrication errors are somewhat systematic. And if you have many images from the same camera, you have the same gains and offsets in all of the images. If your camera is a pushbroom-sensor type, it has the same gains and offsets in corresponding pixels of each row, and whether it does or not, there's probably a common gain factor per pixel line, related to using the same ADCs or being digitized at the same time. So if your lossless algorithm can model this "random noise" it may be able to cut it in half or more.
> I think you mean "unless the lossy algorithm has a perverse encoding."
That's right.
> It's true that no lossless algorithm can beat a lossy algorithm if averaged across all possible images
I concede the point. I thought that this argument applied to the average of any input set, not just the set of all possible images, but then I realized that it's actually pretty easy to come up with counterexamples e.g. N input images, which can be losslessly compressed to log2(N) bits, which will easily outperform jpg for small N. That's not a very realistic situation, but it does torpedo my "proof".
It's probably still true that no lossless format could possibly beat lossy JPEG for typical photos.
If I understand your example correctly, it only has the lossless algorithm beating JPEG by cheating: the lossless algorithm contains a dictionary of the possible highly compressible images in it, so the algorithm plus the compressed images still weighs more than libjpeg plus the JPEGs. But there are other cases where the lossless algorithm doesn't have to cheat in this way; for example, if you have a bunch of images generated from 5×5-pixel blocks of solid colors whose colors are generated by, say, a known 68-bit LFSR, you can code one of those images losslessly in about 69 bits, but JPEG as typically used will probably not compress them by an atypical amount. Similarly for images generated from the history of an arbitrary 1-D CA, or—to use a somewhat more realistic example—bilevel images of pixel-aligned monospaced text in a largish font neither of whose dimensions is a multiple of 8, say 11×17. All of these represent images with structure that can be straightforwardly extracted to find a lossless representation that is more than an order of magnitude smaller than the uncompressed representation, but where JPEG is not good at exploiting that structure.
I think you mean "unless the lossy algorithm has a perverse encoding."
But your argument proves too much. It's true that no lossless algorithm can beat a lossy algorithm if averaged across all possible images, for precisely the reason you say. But for the same reason, no lossless algorithm can beat the "compression algorithm" of no compression at all, like BMP but without redundancy in the header, averaged across all possible images.
The unknown question — which I think we don't even have a good estimate of — is how much incompressible information is in the images we're interested in compressing. Some of that information is random noise, which lossy algorithms like JPEG can win by discarding. (Better algorithms than JPEG can even improve image quality that way.) But other parts of that information are the actual signal we want to preserve; as demonstrated by Deep Dream, JPEG doesn't have a particularly good representation of that information, but we have reason to believe that on average it is a very small amount of information, much smaller than JPEG files.
If it turns out that the actually random noise is smaller than the delta between the information-theoretic Kolmogorov complexity of the signal in the images we're interested in compressing and the size of typical JPEG files for them, then a lossless algorithm could beat JPEG, as typically applied, on size. Of course, a new lossy algorithm that uses the same model for the probability distribution of images, and simply discards the random noise rather than storing it as a lossless algorithm must do, would do better still.
We are seeing this in practice for audio with LPCNet, where lossy compression is able to produce speech that is more easily comprehensible than the original recording: https://people.xiph.org/~jm/demo/lpcnet_codec/
It seems unlikely to me that the ideal predictor residual, or random noise, in an average photo (among the photos we're interested in, not all possible Library-of-Borges photos) is less than the size of a JPEG of that photo. But we don't have a good estimate of the Kolmogorov complexity of typical photos, so I'm not comfortable making a definite statement that this is impossible.
For example, one unknown component is how random camera sensor noise is. Some of it is pure quantum randomness, but other parts are a question of different gains and offsets (quantum efficiencies × active sizes and dark currents) in different pixels. It might turn out that those gains and offsets are somewhat compressible because semiconductor fabrication errors are somewhat systematic. And if you have many images from the same camera, you have the same gains and offsets in all of the images. If your camera is a pushbroom-sensor type, it has the same gains and offsets in corresponding pixels of each row, and whether it does or not, there's probably a common gain factor per pixel line, related to using the same ADCs or being digitized at the same time. So if your lossless algorithm can model this "random noise" it may be able to cut it in half or more.