Google Patent Highlights Costs and Benefits of Generative AI Video

Tons of companies are racing to build an AI video generation engine that actually works.

October 14, 2024

Photo via U.S. Patent and Trademark Office

October 14, 2024

The tech industry is in a mad dash to make generative AI bigger and better, and Google might be looking to stake its claim in IP before it’s too late.

The company wants to patent a system for “generative videos using diffusion models.” A diffusion model refines data over multiple iterations, making these models a good fit for generating image and audio data.

Google claims its system can create longer videos of “higher perceptual quality” than other systems, which typically lack “temporal coherence between the input video and the generated frames.”

Unlike conventional systems, which generate videos by predicting frames one at a time, this tech models entire videos using “3D video architecture,” a process that treats a video as a three-dimensional structure rather than a flat frame, and models multiple frames simultaneously within that structure. This makes videos more natural-looking and coherent.

Google notes that this model is trained on both images and video, giving it a wider pool of high-quality, diverse data to learn from, and improving its performance. Google claims this training method is more computationally efficient, as it helps a model reach its peak performance more quickly. This tech also saves energy with simpler calculations and better memory storage techniques, Google notes.

Tons of companies are racing to build an AI video-generation engine that actually works. OpenAI has been developing a text-to-video model, Sora, which it began teasing back in February. Earlier this month, Meta announced Movie Gen, which can create clips from text prompts and videos from custom images. (Neither of these models are publicly available.)

In August, TikTok’s parent company ByteDance launched a video-generation app called Jimeng AI for users based in China. And Google, of course, isn’t left out of the fun: The company unveiled Veo, its own AI video generator, which it will bring to YouTube Shorts in 2025 to generate clips up to six seconds long.

Why are these companies so interested in building high-powered content generators? While at least part of the answer is simply to see if they’re capable, a larger reason could be to draw in more creators and larger audiences, said Thomas Randall, advisory director at Info-Tech Research Group.

“They’re all competing to get the largest audience, because that’s where they’ll get enough data to use for advertisement revenue. That’s a huge source of income for them,” Randall said. Because of this, making these experiences seamless and engaging could present major revenue opportunities.

However, like any AI model, these video generators cost an astronomical amount of resources to create, said Randall, with likely “billions of dollars in total” going toward power, data center demands, and manpower for research. Dollar amount aside, access to high-quality data can also mean the difference between realistic outputs and uncanny valley ones.

“To make content useful and valuable, it needs to have the right kind of data going into it,” said Randall.