Microsoft takes on OpenAI's Sora with a cutting-edge AI tool capable of turning a static image into a 'Talking Tom'

 Robot standing in front of city with Microsoft logo.
Robot standing in front of city with Microsoft logo.

What you need to know

  • Microsoft has launched VASA, a new tool capable of turning a static image into a short clip by leveraging AI capabilities.

  • The framework supports 512x512 videos at up to 40 FPS with negligible latency.

  • Microsoft is exploring different avenues to ensure the tool is used responsibly before releasing it to the general public.


Microsoft recently unveiled VASA — a new framework that generates "lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip."

VASA-1 can transform a static image into a short clip by producing lip movements that perfectly synchronize with a speech audio clip. Interestingly, the sophisticated cutting-edge technology makes the AI-generated creation lifelike by "capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness."

Will Microsoft's VASA fuel widespread deepfakes?

AI deepfake
AI deepfake

With the emergence of AI, there's been an increase in deepfakes emerging across social media platforms and widespread AI-generated misinformation about elections. And now, with a sophisticated tool such as VASA-1 capable of delivering high video quality with lifelike facial and head dynamics from static images, a major concern might be how this will impact factual and credible news or information from the internet.

The tool supports 512x512 videos at up to 40 FPS with negligible latency. As it happens, I recently stumbled on a video similar to Microsoft's VASA-generated clips on LinkedIn. I noticed the video was rather off in some aspects like the tone, lip, and head movements.

As more people continue to embrace AI, tools like VASA and Image Creator from Designer will improve at generating images and clips. They are already raising concerns among professionals in the built environment industry, as they are good at generating structural designs and could render them obsolete.

We recently reported on a bizarre incident where a popular Canadian rapper used AI to generate a verse using a deceased rapper's voice without his estate's approval and featured it in a track. Similarly, the flow on the diss track was off, but the deceased rapper's voice was uncanny.

Microsoft indicates it has no plans to release "an online demo, API, product, additional implementation details, or any related offerings," till it has elaborate measures to regulate and ensure the tool's offerings are used responsibly.