NVIDIA’s Synthetic Data Farm

NVIDIA wants to take the work out of creating datasets.

Photo via NVIDIA Media Assets.

Sign up to uncover the latest in emerging technology.

AI models are voraciously data-hungry. NVIDIA wants to satisfy their hunger. 

The chip company wants to patent a system for generating “synthetic datasets for training neural networks.” It essentially uses a generative AI model to synthesize datasets that can be used in training a machine learning model for specific visual tasks, such as autonomous driving, robotics or facial recognition. 

Feeding sample visual data to the generative model creates synthetic datasets that are more representative of authentic ones. “The generative model therefore serves as an aid in bridging the content gap that previously existed between synthetic data and real-world data,” NVIDIA said in its filing. 

The machine learning model is trained using the dataset, and is validated against a “real-world validation dataset,” a.k.a. an authentic one. Depending on how well the synthetic dataset trains the machine learning model, that outcome is used for “fine-tuning the generative model for making more synthetic datasets.” 

While synthetic data is already being used to solve the “laborious, costly, and time consuming task” of data collection for visual AI systems, NVIDIA said conventional methods sometimes require experts to create “virtual worlds” to harvest synthetic data, which can be resource-consuming and not accurately mimic real-world scenes. 

Photo via the U.S. Patent and Trademark Office.

Having access to loads of synthetic data can make training AI a far more accessible task, said Kevin Gordon, co-founder of AI consulting and development firm Velora Labs. Massive datasets for training take tons of time and resources, and for small companies or individual developers, this cost often isn’t practical. 

“This possibility of using a neural network system to just generate almost infinite content is really appealing,” said Gordon. “Especially for visual tasks where you really do need a lot of data in general … that can be really hard to capture and aggregate.” 

Another benefit of synthetic data: preserving privacy. These datasets don’t entirely eliminate the use of real-world data (which can be connected to real-world people), as the AI model that creates them is trained on authentic data. However, extracting any authentic data from an AI model trained on synthetic data is significantly challenging, said Gordon. 

“At the very least, it abstracts data,” said Gordon. “That level of decoupling can help with privacy. I wouldn’t say it solves it completely, but it does do a really good job of obfuscation.”  

NVIDIA certainly isn’t the first company to consider synthetic data, said Gordon. Plenty of companies have been working on using synthetic data to solve what Gordon calls “the data problem.” 

At the Conference on Computer Vision and Pattern Recognition in late June, Gordon told me that about 1 in 4 companies were data-focused, many of which were synthetic data providers. And because NVIDIA’s patent is filled with broad strokes and wide-reaching claims, actually securing it may be a difficult task, he said. 

One thing that sets NVIDIA’s patent apart is its mention of robotic systems. Generating synthetic data for robotics is a particularly difficult task compared to data collection for a large language model, said Gordon. And given that NVIDIA has a substantial robotics division, this tech could work in partnership with that. 

At the end of the day, though, NVIDIA’s biggest moneymaker is its chips. Helpful software — or anything else — is just icing on the cake. “Really, they’re interested in getting people to want to use their software so that they buy their chips,” Gordon said. “Having this as part of their solutions for more cheaply developed AI systems … This can help them maintain the top spot as the number one AI chip provider.” 

Have any comments, tips or suggestions? Drop us a line! Email at admin@patentdrop.xyz or shoot us a DM on Twitter @patentdrop. If you want to get Patent Drop in your inbox, click here to subscribe.