CIO | Enterprise AI

Nvidia Wants to Ensure Machine-Learning Data Is Labeled Properly

When a company’s data is labeled incorrectly, machine-learning models can’t use it correctly. Nvidia wants to patent a potential solution.

October 16, 2025

Photo via U.S. Patent and Trademark Office

Caitlin Wolper Phillips

Guest Contributor to The Daily Upside

October 16, 2025

Bad information can be misleading, confusing and annoying — if not to machine learning models themselves, at least to their users.

That’s why Nvidia is looking to patent systems and methods that evaluate “labeled training data for machine learning systems and applications.”

When a company’s labeled data is wrong or inconsistent, the model isn’t able to learn properly.

“If these errors are not corrected before generating training data that includes the labeled sensor data, the training data may be inadequate for its intended purpose, such as training a machine learning model,” the patent says.

The system Nvidia seeks to patent uses “consensus labels” to check whether the labels are accurate, according to its application.

First, AI or algorithms label data from sensors (like images or 3D point clouds) automatically. A human hand comes in to review or fix the label from that tech as necessary: Your first set of labels is created.

But “even by having these users manually verify and/or update the initial labels, at least a portion of the labels may still be inaccurate based on user error,” the patent says.

To double-check, additional people are shown that data, and they label it separately, creating a second set of labels. The system evaluates the data for consensus, figuring out where people seem to agree.

Afterward, the system compares the first labels with the consensus labels to check how accurate the first labels were.

AI systems whose data is accurate and reliable will then be able to perform better.

Right now, a lot of money is going into startups dedicated to data labeling: Meta recently invested $14.3 billion in Scale AI. And that’s because good data matters. It’s key to training your models to be as efficient and accurate as they can be.