My constructive criticism to this video is "90% of it is figures standing still while wind blows their outfit or the camera does a simple move." Sometimes moving their lips as though talking .. though I did like that bird turning it's head smoothly away like "forget this, I'mma preen! Peace out!" Haha
Very much no throughline of concepts from one shot to the next. You never see the same character twice. No foreground dynamic action.. not even simple walking except one far-away character directly away from the camera which means that their silhouette hardly changed.
This all comes from the current generation of video diffusion models that basically just generate an image like they always have except with a hint of temporal coherence they expand that into a short shot with no types of movement except those seen a million times in their training set.
Getting gen models to be able to reason better about motion and to build mental world models of the 3d scene they are managing a 2d window into is going to be a big challenge, and require some additional breakthroughs on a par with the original GPT and stable diffusion breakthroughs that currently act as a foundation to a majority of modern AI innovation.
> ... and require some additional breakthroughs on a par with the original GPT and stable diffusion breakthroughs ...
You say this like Stable Diffusion isn't a 2022 technology. And not early 2022, but quite late (August). ChatGPT is younger.
I mean sure we need more breakthroughs, but we've barely even seen a new hardware generation since those things came out and the researchers are really only getting started with the new capabilities of generative tech. If we don't get more breakthroughs in short order then that would be a stunning halt of progress, a breaking stop the likes of which we have almost never before seen. More breakthroughs are a given.
Very much no throughline of concepts from one shot to the next. You never see the same character twice. No foreground dynamic action.. not even simple walking except one far-away character directly away from the camera which means that their silhouette hardly changed.
This all comes from the current generation of video diffusion models that basically just generate an image like they always have except with a hint of temporal coherence they expand that into a short shot with no types of movement except those seen a million times in their training set.
Getting gen models to be able to reason better about motion and to build mental world models of the 3d scene they are managing a 2d window into is going to be a big challenge, and require some additional breakthroughs on a par with the original GPT and stable diffusion breakthroughs that currently act as a foundation to a majority of modern AI innovation.