Synthetic data for AI-powered market research

Market research is a vital tool for businesses to understand their customers, competitors, and trends.

There’s an entire $82bn industry focused on getting to those insights so businesses can make better decisions.(1)

Arguably, it’s a very traditional industry. However, many market research methods are outdated and inefficient, relying on surveys, interviews, and focus groups that are time-consuming and very expensive to do.

These methods also limit the scope and quality of the data that can be collected and analyzed.

Artificial Intelligence and the potential for market research industry

Modern Large Language Models (LLMs) are changing the game of market research by enabling the generation of synthetic data, which is data that is artificially created to mimic real data.

Scientists around the world are discovering that LLMs can replicate human opinions, biases and ‘thoughts’ so precisely that the synthesized data can augment, if not replace, human respondents.

James, et al. in their work ‘Using GPT for Market research’ show that GPT-3.5, a large language model, can produce hundreds of survey responses to each prompt that are aligned with economic theory and consumer behavior patterns. They also demonstrate that GPT-3.5 can estimate willingness to pay for products and features that match those from human consumers.(2)

Lisa P. Argyle, et al. note that the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.(3)

The research clearly indicates that synthetic data can be used to augment surveys and help generate necessary data for research.

Synthetic data can help augment existing data sets, fill in missing or incomplete data, or create entirely new data for testing and experimentation. Synthetic data can also be tailored to specific scenarios, segments, or variables, allowing for more granular and accurate insights.

The perks are plentiful…

Synthetic data generated by AI has many advantages over traditional data sources.

It is faster, cheaper, and more scalable, as it does not require human intervention or consent. It is also more ethical and privacy-preserving, as it does not involve collecting or exposing sensitive or personal information.

Moreover, synthetic data can be used to explore new possibilities and hypotheses, as well as to validate and improve existing models and algorithms. Faster than ever before.

Synthetic data generated by AI is revolutionizing market research by opening up new avenues and potential for innovation and growth. It can help businesses to understand their markets better, faster, identify new opportunities and threats, optimize their strategies and decisions, and enhance their customer experience and satisfaction.

This approach can help us create new forms of market research that were not possible before, such as simulating customer behavior, preferences, and reactions in different contexts and scenarios that otherwise would be impossible to create.

But it’s not perfect.

Of course, synthetic data is not a perfect solution.

Synthetic data is not a magic bullet that can solve all the challenges of market research. It has its own limitations and risks, such as:

  • The accuracy and reliability of synthetic data depends on the quality and representativeness of the real data.
  • The accuray and reliability of synthethic data depends on the LLMs. Not every LLM is the same, some hallucinate more than others.
  • Synthetic data may introduce biases or errors that are not present in the “real” data.
  • Synthetic data may not capture the complexity and dynamics of human behavior and preferences.
  • Synthetic data may be imprecise and incomplete.

Even though synthetic data is powerful and can help in many aspects, synthetic data should be used with caution and transparency, and complemented by other methods of market research. Synthetic data is not a substitute for real data, but a tool to augment and enrich it.

Businesses should do real research as much as possible. The AI at the moment is not on the level to fully replace human research. Maybe it’s coming in the future, but it’s not there, yet.

At the same time AI is perfect for running pilot research and studies, and of course augment existing research data and efforts.

Introducing OpinioAI

At OpinioAI we’re building a research platform that uses AI language models and synthetic sampling to help researchers source relevant insights, data, opinions in a simple and scalable way; without relying on polls, surveys and other legacy methods.

OpinioAI does not depend on real respondents or samples, but rather creates synthetic personas or segments that reflect the desired traits or existing data sources.

By using novel AI techniques, OpinioAI can generate realistic and diverse opinions that can help businesses conduct market research faster and better, even if they lack the resources or expertise to do so.

We have a lot of work ahead of us and we’re looking forward to ushering in the market research industry into the new AI-powered era.

Feel free to check the platform and let us know what you’d like to see there or if we can improve it anyhow at:


1. Statista,
2. Brand, James, Ayelet Israeli, and Donald Ngwe. “Using GPT for Market Research.” Harvard Business School Working Paper, No. 23-062, April 2023. (Revised July 2023.)
3. Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate. “Out of One, Many: Using Language Models to Simulate Human Samples”, Published online by Cambridge University Press (February 2023)

Written by Nikola K.

November 9, 2023

You May Also Like…


Submit a Comment

Your email address will not be published. Required fields are marked *