Why Salesforce Might Be Eying AI Image Generation

The tech could make graphics generation more intuitive as the company continues its push towards AI agents.

November 25, 2024

Photo via U.S. Patent and Trademark Office

Nat Rubio-Licht

nat@thedailyupside.com

November 25, 2024

Salesforce wants its AI to be picture perfect.

The company is seeking to patent a system for “controllable image generation,” technology that essentially builds guardrails into AI image generators to prevent them from running amok when asked to do several different tasks. Salesforce’s tech deals with diffusion models, which are primarily used in image generation.

Existing methods “exhibit poor performance when attempting to use the same [diffusion model] for multiple different tasks,” Salesforce said in the patent application. “Training or fine-tuning a separate [diffusion model] for each task is expensive in terms of memory and computation.”

Salesforce’s tech combines the capabilities of two models: one a pre-trained, fixed diffusion model and the other a trainable diffusion model for task adaptability. This flexible model may start as a copy of the fixed one, but with trainable parameters.

The pre-trained, fixed model is the core of the system, while the adaptable model works in tandem with it to modify image generation. When this model receives an input image and a task description, the flexible model basically fine-tunes specific layers of the fixed model to make the output more relevant to the task.

The design cuts the need to create separate diffusion models for different tasks, creating one unified image model that’s adaptable to multiple requests.

The innovative design of the system could overcome a common issue that image generators face, said Bob Rogers, Ph.D., the co-founder of BeeKeeperAI and CEO of Oii.ai. With typical image models, “you have to try a whole bunch of different prompts until you get the thing you like,” said Rogers. “It just is not systematic at all.”

With this system, rather than finally arriving at the right image via “trial and error,” a “tunable controller” could be applied to diffusion models that allow for subtle and more efficient tweaks to images, he said. Imagine it like a knob that you can adjust to change an image, he said, instead of starting from scratch.

In an enterprise context, an easily-adjustable image could expand Salesforce’s offerings for marketing and creative campaigns, said Rogers, saving time and resources so you don’t need to “cross your fingers” for a good outcome.

“You can’t ask people to randomly hope for the right kinds of graphics,” he said. “I think that control knob idea is actually very potentially compelling.”

The concept could also fit in with Salesforce’s broader push toward agent-based AI, said Rogers. As the company seeks to make interfacing with agents as easy and intuitive as possible, allowing customers to get to their desired outputs with minimal back and forth could make its products more desirable, he said.

“You could imagine with specific enterprise agents that are all about generating images, that Salesforce is going to be excited to have this tunable graphics-generation capability,” said Rogers.