Academic research & studies on AI in (synthetic) market research

Artificial Intelligence (AI), and more specifically Modern Large Language Models (LLMs) are changing the entire market research industry by enabling the generation of synthetic data, of such quality that it can augment, if not replace human respondents.

Around the world scientists are discovering and reporting how exactly precise and useful LLMs are in replicating human opinions.

This blog entry here aims to be a repository and to share interesting scientific research that’s moving the entire industry; and our team in building OpinioAI as a platform. I’ll include some of the interesting abstracts and quotes for easier digestion.

I wish to personally thank all the researchers working and publishing on the topic. They’re very interesting to read and the studies are extremely useful. Thank you!

The article will be constantly updated as new research is discovered. If you know some, please do share with me nikola [at]!

Research, studies & articles:

  • Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2022). Out of one, many: using language models to simulate human samples. arXiv (Cornell University).

We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models.

We show that the “algorithmic bias” within one such tool — the GPT-3 language model — is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property “algorithmic fidelity” and explore its extent in GPT-3.

We create “silicon samples” by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States.

We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.

  • Hämäläinen, P., Tavast, M., & Kunnari, A. (2023). Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. ACML Digital Library.

  • Aher, G. (2022, August 18). Using large language models to simulate multiple humans and replicate human subject studies.

  • Horton, J. J. (2023, January 18). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?

Large language models (LLMs) have quickly become popular as labor-augmenting tools for programming, writing, and many other processes that benefit from quick text generation. In this paper we explore the uses and benefits of LLMs for researchers and practitioners who aim to understand consumer preferences.

We focus on the distributional nature of LLM responses, and query the Generative Pre-trained Transformer 3.5 (GPT-3.5) model to generate hundreds of survey responses to each prompt. We offer two sets of results to illustrate our approach and assess it.

First, we show that GPT-3.5, a widely-used LLM, responds to sets of survey questions in ways that are consistent with economic theory and well-documented patterns of consumer behavior, including downward-sloping demand curves and state dependence.

Second, we show that estimates of willingness-to-pay for products and features generated by GPT-3.5 are of realistic magnitudes and match estimates from a recent study that elicited preferences from human consumers.

We also offer preliminary guidelines for how best to query information from GPT-3.5 for marketing purposes and discuss potential limitations.

  • Li, P., Castelo, N., Katona, Z., & Sárváry, M. (2022). Language Models for Automated Market Research: A new way to generate Perceptual maps. Social Science Research Network.

Public opinion reflects and shapes societal behavior, but the traditional survey-based tools to measure it are limited.

We introduce a novel approach to probe media diet models — language models adapted to online news, TV broadcast, or radio show content — that can emulate the opinions of subpopulations that have consumed a set of media.

To validate this method, we use as ground truth the opinions expressed in U.S. nationally representative surveys on COVID-19 and consumer confidence.

Our studies indicate that this approach is (1) predictive of human judgements found in survey response distributions and robust to phrasing and channels of media exposure, (2) more accurate at modeling people who follow media more closely, and (3) aligned with literature on which types of opinions are affected by media consumption. 

  • Li, P., Castelo, N., Katona, Z., & Sárváry, M. (2024). Frontiers: Determining the validity of large language models for automated perceptual analysis. Marketing Science.
  • Arora, N., Chakraborty, I., & Nishimura, Y. (2024). Hybrid Marketing Research: Large Language Models as an Assistant. Available at SSRN.

An area within marketing that is well poised for adoption of large language models (LLMs) is marketing research. In this paper the authors empirically investigate how LLMs could potentially assist at different stages of the marketing research process.

They partnered with a Fortune 500 food company and replicated a qualitative and a quantitative study that the company conducted using GPT-4. The authors designed the system architecture and prompts necessary to create personas, ask questions, and obtain answers from synthetic respondents. Their findings suggest that LLMs present a big opportunity, especially for qualitative research.

The LLMs can help determine the profile of individuals to interview, generate synthetic respondents, interview them, and even moderate a depth interview. The LLM-assisted responses are superior in terms of depth and insight.

The authors conclude that the AI-human hybrid has great promise and LLMs could serve as an excellent collaborator/assistant for a qualitative marketing researcher. The findings for the quantitative study are less impressive.

The LLM correctly picked the answer direction and valence but does not recover the true response distributions well. In the future, approaches such as few-shot learning and fine-tuning may result in synthetic survey data that mimic human data more accurately.

  • Li, P., Castelo, N., Katona, Z., & Sárváry, M. (2024). Frontiers: Determining the validity of large language models for automated perceptual analysis. Marketing Science.

  • Sarstedt, M., Adler, S. J., Rau, L., Schmitt, B. (2024.). Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Wiley Online Library.

  • Sun, S., Lee, E., Nan, D., Zhao, X., Lee, W., Jansen, B. J., & Kim, J. H. (2024, February 28). Random Silicon Sampling: Simulating human Sub-Population opinion using a large language model based on Group-Level demographic information.

Large language models exhibit societal biases associated with demographic information, including race, gender, and others. Endowing such language models with personalities based on demographic data can enable generating opinions that align with those of humans.

Building on this idea, we propose “random silicon sampling,” a method to emulate the opinions of the human population sub-group. Our study analyzed 1) a language model that generates the survey responses that correspond with a human group based solely on its demographic distribution and 2) the applicability of our methodology across various demographic subgroups and thematic questions.

Through random silicon sampling and using only group-level demographic information, we discovered that language models can generate response distributions that are remarkably similar to the actual U.S. public opinion polls.

Moreover, we found that the replicability of language models varies depending on the demographic group and topic of the question, and this can be attributed to inherent societal biases in the models. Our findings demonstrate the feasibility of mirroring a group’s opinion using only demographic distribution and elucidate the effect of social biases in language models on such simulations.

  • Kim, J., & Lee, B. (2023, May 16). AI-Augmented Surveys: Leveraging large language models and surveys for opinion prediction.

  • Sarstedt, M., Adler, S. J., Rau, L. A., & Schmitt, B. H. (2024). Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Psychology & Marketing.

  • Schoenegger, P., Tuminauskaite, I., Park, P. S., Bastos, R. V. S., & Tetlock, P. E. (2024, February 29). Wisdom of the Silicon Crowd: LLM ensemble prediction capabilities rival human crowd accuracy.

 In Study 1, we expand this research by using an LLM ensemble approach consisting of a crowd of 12 LLMs. We compare the aggregated LLM predictions on 31 binary questions to those of a crowd of 925 human forecasters from a three-month forecasting tournament.

Our preregistered main analysis shows that the LLM crowd outperforms a simple no-information benchmark, and is not statistically different from the human crowd. We also observe a set of human-like biases in machine responses, such as an acquiescence effect and a tendency to favour round numbers.

In Study 2, we test whether LLM predictions (of GPT-4 and Claude 2) can be improved by drawing on human cognitive output. We find that both models’ forecasting accuracy benefits from exposure to the median human prediction as information, improving accuracy by between 17% and 28%, though this leads to less accurate predictions than simply averaging human and machine forecasts.

Our results suggest that LLMs can achieve forecasting accuracy rivaling that of the human crowd: via the simple, practically applicable method of forecast aggregation.

Written by Nikola K.

February 19, 2024

You May Also Like…