Data isn’t the only thing that makes an AI model work. Google wants to make sure its chips are up to snuff, too.
The tech firm is seeking to patent a system for “debugging correctness issues” when training machine learning models. Google defines a “correctness issue” as basically a failure in training execution, or when the outcome is “not deemed to be acceptable for a particular context.” These correctness issues can stem from the “configuration” of the computing system performing the training.
Google’s system trains two machine learning models using two different computing systems and compares them to one another. The system uses “shared training operations” on each, meaning that the only difference in training is the computing systems themselves.
Google’s system then comes up with a “similarity measure” between the two models, which is determined by comparing each model’s output. The similarity measure then is used to compare how well different computing systems train models by identifying what needs to be debugged, Google noted.
Think of it this way: Imagine two cars taking a road trip from San Francisco to Los Angeles, stopping at the same spots for gas and hitting the same traffic. But one car is a brand-new Maserati and the other is, well, a 2002 Ford Fiesta. You can probably guess which car will get there first.
It’s sometimes easy to tell when a computing system isn’t doing its job properly, but when training a neural network, there can sometimes be “fuzziness,” said Kevin Gordon, co-founder of AI consulting and development firm Velora Labs.
Neural networks are not “super-precise instruments,” Gordon said, so their parameters and conditions can shift to a certain extent without breaking accuracy. Google’s system may aim to figure out how different AI training hardware impacts that breaking point.
(Going back to the car analogy, Google’s system may be trying to figure out how good a Fiesta needs to be to compete with the Maserati. While it may not be able to get from San Francisco to L.A. at exactly the same time, the difference may be negligible.)
If this patent is related to Google’s internal hardware effort, this could provide a major benefit to its chip business. While Google isn’t a name-brand chip name, if it can build hardware that can “tolerate fuzziness in computation” without compromising accuracy, that opens the door for the company to create a “100-times more power efficient chip.”
“If they can solve the kind of fuzziness of it, they can have something that’s really competitive compared to what NVIDIA, or really anybody else, offers,” said Gordon.
Google has been trying to go after NVIDIA’s dominance in the AI hardware space. In April, the company released a research paper claiming that its Tensor Processing Units, which power more than 90% of its AI training as part of its supercomputers, are faster and more energy efficient than NVIDIA’s comparable A100 chip.
While Google does not sell its TPUs outright, the hardware is a major piece of the company’s AI work. Making these chips even more efficient could be part of its plan to remain a top player in the AI arms race.