The speech recognition dilemma

Sign up to uncover the latest in emerging technology.

Plus: Anduril and Boeing’s Game of Drones; Intel’s plan to break down communication barriers

Happy Thursday and welcome to Patent Drop! 

This morning, we’re taking a look at why speech recognition matters so much to Big Tech companies, and the barriers that stand in the way. Plus, we’ll check out Boeing and Anduril’s new drones; and Intel’s plan to make communication easier for the hearing impaired. 

But before we get into it, we wanted to ask: How do you stay competitive in today’s job market? Today’s Sponsor Brilliant is helping millions master essential concepts in math, AI, computer science, and more in just minutes a day. Ditch the lecture videos and opt for interactive learningtry Brilliant free for 30 days and score 20% off your premium subscription today. 

Let’s take a peek. 

#1. Say that again? 

Tech companies seem to be really keen on making sure they hear you right. 

To start, Baidu filed a patent application for “sound source localization tech,” which, as the name implies, helps a digital assistant determine the “direction of a sound source.” Using voice processing and deep learning, this tech works by taking in audio data from users’ commands and requests, marking those clips as coming from certain directions and training a neural network with those clips. Pinpointing exactly where a sound is coming from helps the virtual assistant with accuracy and speed of response. 

Separately, Baidu also seeks to patent a system for “synthesizing speech,” which essentially reduces excess noise in audio data to better understand the commands that a user asks. For example, if you ask a smart home to call up a friend, this system will be able to better drown out the noise of your dishwasher, your TV and your upstairs neighbor stomping about. 

And of course, Baidu is not the only one working on this: Google filed a patent for “Multi-Talker Overlapping Speech Recognition,” which helps its smart devices straighten out commands when people just won’t stop talking over each other. This tech works by tracking the start and end times of multiple overlapping lines of speech, and filling in the gaps when the speech gets too jumbled. This is in order to track who is saying what, and whether either one was directing their speech towards the device. 

So if you decide to host a dinner party with your friends, and your parents, and your cousins and your in-laws, your smart speaker will be able to hear you over the cacophony if you ask it to play music or set a timer.