Google is taking another crack at text-to-video generation with Lumiere, a new AI model capable of creating surprisingly high-quality content.
The tech giant has certainly come a long way from the days of Imagen Video. Subjects in Lumiere videos are no longer these nightmarish creatures with melting faces. Now things look much more realistic. Sea turtles look like sea turtles, fur on animals has the right texture, and people in AI clips have genuine smiles (for the most part). What’s more, there's very little of the weird jerky movement seen in other text-to-video generative AIs. Motion is largely smooth as butter. Inbar Mosseri, Research Team Lead at Google Research, published a video on her YouTube channel demonstrating Lumiere’s capabilities.
Google put a lot of work into making Lumiere’s content appear as lifelike as possible. The dev team accomplished this by implementing something called Space-Time U-Net architecture (STUNet). The technology behind STUNet is pretty complex. But as Ars Technica explains, it allows Lumiere to understand where objects are in a video, how they move and change and renders these actions at the same time resulting in a smooth-flowing creation.
This runs contrary to other generative platforms that first establish keyframes in clips and then fill in the gaps afterward. Doing so results in the jerky movement the tech is known for.
In addition to text-to-video generation, Lumiere has numerous features in its toolkit including support for multimodality.
Users will be able to upload source images or videos to the AI so it can edit them according to their specifications. For example, you can upload an image of Girl with a Pearl Earring by Johannes Vermeer and turn it into a short clip where she smiles instead of blankly staring. Lumiere also has an ability called Cinemagraph which can animate highlighted portions of pictures.
Google demonstrates this by selecting a butterfly sitting on a flower. Thanks to the AI, the output video has the butterfly flapping its wings while the flowers around it remain stationary.
Things become particularly impressive when it comes to video. Video Inpainting, another feature, functions similarly to Cinemagraph in that the AI can edit portions of clips. A woman’s patterned green dress can be turned into shiny gold or black. Lumiere goes one step further by offering Video Stylization for altering video subjects. A regular car driving down the road can be turned into a vehicle made entirely out of wood or Lego bricks.
Still in the works
It’s unknown if there are plans to launch Lumiere to the public or if Google intends to implement it as a new service.
We could perhaps see the AI show up on a future Pixel phone as the evolution of Magic Editor. If you’re not familiar with it, Magic Editor utilizes “AI processing [to] intelligently” change spaces or objects in photographs on the Pixel 8. Video Inpainting, to us, seems like a natural progression for the tech.
For now, it looks like the team is going to keep it behind closed doors. As impressive as this AI may be, it still has its issues. Jerky animations are present. In other cases, subjects have limbs warping into mush. If you want to know more, Google’s research paper on Lumiere can be found on Cornell University’s arXiv website. Be warned: it's a dense read.
And be sure to check out TechRadar's roundup of the best AI art generators for 2024.