|

Amazon’s New AI Voice Model Makes Convos Feel Mighty Real

Amazon’s head AGI scientist told TechCrunch that Nova Sonic is top-of-class when it comes to conversational flow.

Photo of Amazon CEO Andy Jassy
Photo by JD Lasica via CC BY 2.0

Sign up for smart news, insights, and analysis on the biggest financial stories of the day.

Better watch your tone: Amazon’s latest AI voice model is a bit of an empath. Amazon debuted Nova Sonic yesterday, an AI assistant the tech company said can tell how the people it chats with are feeling. 

Amazon’s head AGI scientist told TechCrunch that Nova Sonic is top-of-class when it comes to conversational flow and knowing what action users want it to take (opening an app, looking up flights online), not to mention quicker-witted. The exec said Nova Sonic is nearly 50% better at listening (it mishears fewer words) than OpenAI’s voice model GPT 4o, quicker to respond, and cheaper to run.

Nova Sonic is already being used to power Amazon’s legacy voice assistant, Alexa. But even without an Echo on their nightstands, people could find themselves chatting with the bot. Nova Sonic is available to third-party developers, who can deploy it for tasks like handling customer service calls or tutoring language-learners. 

Chat, Is This Real?

Alexa and Siri have been telling people, “Sorry, I can’t help with that” for over a decade. But now the pressure is building for voice assistants to be more useful. Innovation in the space was spurred on last spring by OpenAI:

  • The launch of its voice assistant GPT 4o prompted comparisons with ScarJo’s character in “Her.”
  • Following OpenAI’s bot-drop, Google launched Gemini Live to let users chat aloud with its chatbot, Gemini. 

The momentum has kept up since: Google added eight new voices to its Chirp 3 AI model last month as tech companies fight to edge out rivals. Also last month, people were unnerved by how lifelike startup Sesame’s AI assistant sounds. 

Sesame’s co-founder said it’s still in the uncanny valley, but believes the tech can climb out. Some critics don’t think it should. 

Keep it Uncanny: AI audio models are becoming so lifelike that critics are concerned people won’t be able to tell when they’re chatting with a bot. Their fears go beyond callers falling in love with AI customer-service agents: Advanced AI audio tech could be misused for social engineering schemes. Bots could be tweaked to sound like family members, celebs, or politicians, and scammers could use those friendly voices to ask for money and data. OpenAI held back a wide release of its voice-cloning tech over such concerns.

Sign Up for The Daily Upside to Unlock This Article
Sharp news & analysis on finance, economics, and investing.