While there is only 1 biggest player (openAI) I would say we can, but if it gets to a point where we have multiple players i suppose it would be extremely hard without proper controls already in place.
To me this is quite an interesting idea, I am not sure if now is the correct time to pause it. Or how to spot when it is correct time. What are your thoughts?
Yes, the proof... Actually, there must be some diff tool to compare models before and after processing some source? I'm not sure, but it must be possible to detect pieces of come ingested data in the model itself. I've seen the famous "wolf misdetection" investigation screenshots, when the AI, apparently labeled a dog as a wolf because there was snow around on the picture.
For Stable Diffusion I think the average number of bits in the model compared to the number of training images is in the order of 6-8 bits per image. There is no "storage" of the training images. It's 250 TB data in, and 1.4 GB in the weight file or so depending on the precision.. I think those 250 TB are compressed as well, so maybe 25,000 TB raw data in distilled down to 1.4 GB. I fairly certain you could never prove an AI saw your image. You'd have to sue the company and by discovery look at their training data.
There are probably pathological cases where a repeating image is more strongly overfit in the training data and could be reproduced in much more detail than this average though. But the systems learn similar to the human brain, they learn the gist of a style or scene and how it relates to words. It's not a search engine, it doesn't copy/paste any block of pixels...
One interesting example is that since SD's original training set included some stock photo watermarked images, it learned that there was a concept of watermark, which can end up in the middle of generated images. Not in an intelligible way, but you can see roughly how it interpreted this detail. And in those cases you DO have a very very repeating similar pixel bitmap in the training data.