Another example of using synthetic data for product development? Building robots.
“We’re seeing so much improvement in robotics these days,” says Agustin Huerta, SVP of digital innovation at software development company Globant. There are virtual environments, like the Nvidia Omniverse, where simulated robots can interact with simulated objects, creating large amounts of training data to jump-start a robot’s ability to navigate spaces or handle products.
“And if you’re talking about computer vision data for training autonomous driving solutions, we need synthetic data — there’s no other way to do it,” he says. “Otherwise, we’ll need to be crashing cars.”
5. Exploring new markets without historical data
Another use case for synthetic data is when a company has a product, but wants to sell it in a new market. Businesses can model how consumers might behave, what they prefer, and how they might respond to new products or services, says Thota. They can also use the simulated data to help refine features and marketing strategies.
“A bank looking to enter a new region can use synthetic data to simulate local economic conditions, spending habits, and how people might adopt their financial products,” he adds.
Anand Rao, AI professor at Carnegie Mellon University, once worked with a ride sharing company looking to expand to new markets. But using the same strategy everywhere wouldn’t have been very effective since conditions vary geographically.
“In New York City, you need a five to 10 minute turnaround,” Rao says. “They’re less tolerant of mispredictions, like if it says eight minutes but it takes 12 minutes for the car to come. But in Ann Arbor, Michigan, if it’s a few minutes late, they can live with it.”
That means the optimization strategies needed to be different, and synthetic data helped to refine those strategies.
“We had over 200,000 go-to-market scenarios for ten cities,” he adds. That gave executives real insights into how to adapt for the new markets.
6. Constructing digital twins
Historically, digital twins have been used for things like modeling jet engines, helping companies with predictive maintenance, or for designing and managing factories and other complex physical facilities. Today, the definition of digital twins is expanding to include things like software systems, business workflows, or even people.
Companies are simulating customers, their behaviors, shopping journeys, buying patterns, and how they’ll respond to a particular promotion, says Tom Edwards, Americas consumer AI leader at EY. They do it by creating synthetic customer profiles. “It helps us understand how different demographics will respond to different product positioning,” he says. “And what we get out is better demand forecasting and better targeting.”
And he’s seeing companies using synthetic personas instead of focus groups.
“You can create hundreds of personas and test different messaging,” he says. “Synthetic data allows you to fill in psychographic details.”
These simulated personas can also be used to improve ecommerce personalization.
“I can run millions of different combinations, and when it comes time for you to shop, I can immediately match you based on one of these preconfigured personas, built on synthetic data,” he adds. “I know you better than a traditional algorithm might because I’ve already extrapolated millions of potential paths forward.”
The business value here could potentially be in the millions of dollars, he says, as it unlocks a way to seamlessly align with consumers and provide recommended products they haven’t seen before. A company can also create digital twins of employees.
“Internally, one of the things we’re looking at is our staffing and skills,” says Nick Kramer, leader of applied solutions at SSA & Company, a management consulting firm.
“We have historic data about our consultants, and unreliable data about skills and capabilities,” he says. “But we have rich project data and out of that, we’ve got our lump of clay, so to speak, and have been experimenting with different ways to synthetize data.”
The synthetic personas could be people, project roles, or specific titles, he says. Those are combined into simulated project teams, and that, in turn, creates a view of what staffing could look like and how to balance it against skills and tools, and how to optimize for outcome, speed, revenues, and margins.
7. Preparing for agentic AI
As AI evolves, so do the opportunities to use synthetic data. This year, for example, it’s all about agentic AI.
According to an April Cloudera survey, 96% of enterprise IT leaders say they plan to expand their use of AI agents in the next 12 months. And although 57% say they’ve already implemented AI agents, the single biggest barrier is data privacy, with 53% saying it’s slowing adoption. But it’s not just about preserving privacy when it comes to training AI agents.
“Synthetic data is a great way to accelerate the learning of those agents and map through complex scenarios,” says EY’s Edwards. It can also be used to ensure that agents can handle anything that’s thrown at them.
“If you’re able to run millions of different scenarios based on complex interactions, that becomes an incredibly valuable tool,” he says. “It’s going to become a foundational aspect for how you deploy an agent within an organization.”
Reality check: The risks of overreliance on synthetic data
There are also dangers overusing synthetic data. As Panetta discovered when trying to create synthetic images of people wearing face masks, it has its limits.
“If abused, you risk the equivalent of the overfitting problem where outputs become highly repetitive,” says Gordon Van Huizen, SVP of strategy at Mendix, an AI platform company. “Then feeding a prompt outside the training data can result in random or bizarre results because the system has difficulty interpreting the new pattern.”
There are ways to address this, though. Companies can create more diverse data sets, blend synthetic data with real data, or add noise to the data to create outliers.
“But the key to capitalizing on synthetic data is to always include human validation protocols wherever possible,” he says.