|

Google Wants to Fix a Lot About Video

From avatars’ face movements to audio enhancement, Google is seeking to patent methods for improving video calls.

Photo of a Google patent
Photo via U.S. Patent and Trademark Office

Sign up to get cutting-edge insights and deep dives into innovation and technology trends impacting CIOs and IT leaders.

Video calls have transformed the workplace, but they’re not without their setbacks: connection issues, equipment malfunctions and more. 

Google is expressing interest in making calls a little clearer with two new patent applications.

In one, the Mountain View, California-based company is seeking to patent a method for “modifying a facial feature of an avatar” to make its speech look more realistic when callers rely on animated representations of themselves instead of live video.

First, the system analyzes a voice recording and identifies moments where the sounds change because of a new word or syllable. Then, it determines what an avatar’s mouth should be doing at the moment any of those variations occur: opening, closing or changing shape. There’s a particular focus on vowel sounds, because they usually have smoother transitions. 

The Natural Look

Based on all of that collected information, the system updates the avatar’s face, including its lip shapes and mouth positions so it better mimics saying the words. 

That enables digital characters to speak in a way that looks more natural, with their facial movements better synced with the actual sounds coming from their mouths.

Whether you’re using an avatar or putting your own face front and center, Google is also looking to patent a method to alter a video’s audio “to increase understandability.” The aim is to improve how people sound on calls so that there’s no trouble understanding them. 

During a video call, the speaker’s voice is transmitted through the meeting software as audio. With the method Google wants to patent, the system will recognize that the speaker is hard to understand (whether the reason is background noise, bad audio quality or unclear speech) and run the audio through an AI model. 

The model then modifies the audio to make it easier to understand, whether that involves clearing up speech, adjusting the volume, or removing background noise. Afterward, it transmits the improved audio to other meeting participants in real time.

The two patents are part of Google’s bid for productivity tech — particularly meeting tech — as it competes with Zoom and Microsoft. Microsoft, in fact, is already seeking to read facial expressions during meetings, and Zoom is working on a work-life balancer. 

With numerous meeting options available in a more remote world, the productivity race is on.

Sign Up for CIO Upside to Unlock This Article
Cutting-edge insights into technology trends impacting CIOs and IT leaders.