Big Tech’s Super Vision

Baidu and Microsoft want to use computer vision for security checks.

Photo by David Tran via iStock

Sign up to uncover the latest in emerging technology.

Tech firms are using AI to do more than just help you compose emails or conduct searches. Both Baidu and Microsoft filed patent applications for tech that detects attributes of a physical environment. 

First up, let’s check out Baidu. The Chinese search giant is seeking to patent a method to train an AI model for “human body attribute detection.” As the name suggests, this system works by training an AI model on positive and negative “sample sub-images,” or images that essentially show one person in two scenarios, complete with annotations. The AI model then can use computer vision to detect those attributes, such as whether or not a person is wearing a safety helmet.

Baidu said in its filing that this kind of tech can be utilized in various safety inspection scenarios – to ensure employees are wearing the proper protective gear in manufacturing environments, or to determine whether or not someone is smoking in a non-smoking area of a workplace, for example. 

“In the safe operation and production environment of a factory area …The human body attribute detection performed on the staff is to ensure normal and safe operation,” Baidu said in its filing. 

Microsoft is toying with a similar concept. The company wants to patent adaptive AI for “three-dimensional object detection” using synthetic training data. This essentially uses synthetically generated images of containers “virtually packed with items of interest” and containers packed with items of “non-interest” to train a machine learning model to detect which is which. 

Basically, Microsoft is making an AI version of TSA. The company noted that this tech could be used at security checkpoints like airports or courthouses, where screening processes led by humans are “slow, expensive, and inaccurate.” Training these models to find items of interest takes massive amounts of data (hence, the need for synthetic data). 

“This requirement to curate large datasets for training AI models is a major drag on algorithm development, making it impossible to rapidly respond to emerging threats (e.g., 3D-printed weapons),” Microsoft noted in its filing. 

Baidu computer vision (left); Microsoft tech (right). Photos via the U.S. Patent and Trademark Office.

So why would these companies want to create AI security guards? Microsoft has been plugging its 365 Copilot generative AI development, while Baidu threw out its own chatbot, Ernie Bot, to keep up with the rest of Big Tech’s AI race. But neither has shown an interest in physical security before, focusing instead on their core moneymakers like cloud computing or hardware. What’s more, Abhinai Srivastava, co-founder of AI and computer vision startup Mashgin, said that it’s possible that this kind of tech may not see the light of day for a long while. 

The short answer is that Microsoft and Baidu are researching and filing patents for “everything under the sun,” Srivastava said. Despite the effort put into development, at the end of the day, an executive decides whether or not a piece of tech becomes a product, he said. It’s also not uncommon for tech developments like these to be put on the backburner for long periods of time, and pulled out when competitors start to show interest. 

“It just comes down to whether the executive teams see the value, or a path forward with the product,” said Srivastava. “Many times the problem is they’re already making a lot of money and they’re preoccupied with existing product lines. I think that’s why these things don’t usually see the light of day.” 

Whether or not Microsoft and Baidu’s interest in this tech is passive, computer vision is a burgeoning field. It has potential in all sorts of environments – factories, security checkpoints or, in Mashgin’s case, retail. 

But with that potential comes the obstacle of getting the tech to be 100% accurate. The biggest hurdle: Getting your hands on massive amounts of different, specialized data is expensive and time-consuming, said Srivastava. 

“Computer vision is very easy until it’s hard – and then it’s very hard,” Srivastava said. “The first 80% is very quick and very easy. But then the last 20% ends up being where you could spend the next five years perfecting it.” 

A lot of the computer vision field is at that 80% mark, he said, but over the next five years, developers are bound to break that boundary. Maybe once these developments cross that bridge, Baidu or Microsoft-branded computer vision devices will start hitting the proverbial shelves. 

Have any comments, tips or suggestions? Drop us a line! Email at admin@patentdrop.xyz or shoot us a DM on Twitter @patentdrop. If you want to get Patent Drop in your inbox, click here to subscribe.