DeepMind Patent Gives AI Robots ‘Inner Speech’
With AI-powered robotics and agents, more context is always better.

Sign up to get cutting-edge insights and deep dives into innovation and technology trends impacting CIOs and IT leaders.
DeepMind wants to give its agents an inner monologue.
Google’s AI lab is seeking to patent “intra-agent speech to facilitate task learning,” a tool that would help AI agents or robots understand the world around them.
The system would take in images and videos of someone performing a task and generate natural language to describe what’s happening using a language model. For example, a robot might watch a video of someone picking up a cup, while receiving the input “the person picks up the cup.”
That allows it to take in what it “sees” and pair it with inner speech, or something it might “think.” The inner speech would reinforce which actions need to be taken when faced with certain objects.
The system’s key benefit is termed “zero-shot” learning because it allows the agent or robot to interact with objects that it hasn’t encountered before. They “facilitate efficient learning by using language to help understand the world, and can thus reduce the memory and compute resources needed to train a system used to control an agent,” DeepMind said in the filing.
This isn’t the first time DeepMind has made its robotics goals known: The firm rolled out an on-device version last week of its vision language robotics model that can run without internet access. Google said it’s “small and efficient enough to run directly on a robot.” Like the patent’s design, the flagship model is engineered to help robots generalize to complete tasks on which they may not have been trained.
For AI-powered robotics and agents, more context is always better. Giving a robot or agent an inner monologue provides more data to train on and work with as well as better tools to understand unfamiliar situations.
The unpredictability of AI-powered robots remains a barrier to adoption – one that many firms are trying to solve, with Google, Nvidia and Intel all seeking patents targeting similar objectives.