Look at how anyone who's not an artist (especially children) draws pretty much anything, whether it's people or animals or bicycles. You'll see a lot of asymmetric faces, wrong shading, completely unrealistic hairs etc and yet all those people are intelligent and excellent at understanding how the world works. Intelligence can't be expressed in a directly measurable way.
Those videos may be cherry-picked but they're almost good enough that you have to pick problems to point out, problems that will likely be gone a couple iterations later.
Less than a decade ago we read articles about Google's weird AI model producing super trippy images of dogs, now we're at deepfakes that successfully scam companies out of millions and AI models that can't yet reliably preserve consistency across an entire image. In another ten years every new desktop, laptop and smartphone will have an integrated AI accelerator and most of those issues will be fixed.
Anthromporphizing models lets assumptions of human behavior slip into discussions. Most people dont even know how their brains process information while models are not children.
New models appear to be close and based on arguments here, closing the gap is simply a matter of time. This is based on observing the rate of progress, not the actual underlying actions. A tendency belied by assumptions based on human thinking, not the signficant amounts of processing that happens unconciously by us.
Sure expanding the data set has improved results, yet bad hands, ghost legs, and other weirdness persist. If you have a world model, then this shouldn't happen - there is a logical rule to follow, not simply a pixel level correlation.
Working from the other side, if image/video gen has reached the point that it is faithfully recreating 3D rules, then we should expect 3D wireframes to be generated. We should see an update to https://github.com/openai/point-e.
This isn't splitting hairs - this behavior can be hand waved away when building a PoC, not when you make a production ready product.
As for scams, they target weaknesses, not the strongest parts of a verification processes, making them strawmen arguments for AI capabilty.
>If you have a world model, then this shouldn't happen - there is a logical rule to follow, not simply a pixel level correlation.
Oh I guess humans don't have world models then. It's so weird seeing this rhetoric said again and again. No a world model doesn't mean a perfect one. A world model doesn't mean a logical one either. Humans clearly don't work by logic.
Those videos may be cherry-picked but they're almost good enough that you have to pick problems to point out, problems that will likely be gone a couple iterations later.
Less than a decade ago we read articles about Google's weird AI model producing super trippy images of dogs, now we're at deepfakes that successfully scam companies out of millions and AI models that can't yet reliably preserve consistency across an entire image. In another ten years every new desktop, laptop and smartphone will have an integrated AI accelerator and most of those issues will be fixed.