A digital twin in market research (also known as a synthetic persona) is a persistent AI model of a real or representative person. It carries that person's preferences, attitudes, demographics, and behavioral patterns, and it can answer new research questions as if it were that individual. Instead of going back to field every time you need to test a new concept, message, or product configuration, you query the twin.
The concept is simple but the implications are significant. A panel of 1,000 digital twins can be re-surveyed instantly, at no additional recruitment cost, as many times as you need. The twins are consistent across studies because they carry the same underlying preference structure. And they can be built for any population: consumers, patients, or healthcare professionals.
This guide covers how digital twins work, the two approaches to creating them (purely synthetic and seeded from real data), and how they apply across consumer research, patient research, and HCP research.
What Makes a Digital Twin Different from a Synthetic Respondent?
A synthetic respondent is a one-time AI-generated survey response. You ask a question, the model generates an answer, and that is the end of the interaction. The respondent does not persist. If you want to ask a follow-up question, you generate a new response with no memory of the first one.
A digital twin is persistent. It has an identity, a preference profile, and a history. When you ask a digital twin a question, the answer is conditioned on everything the twin "knows" about itself: its demographics, its past responses, its utility scores, its attitudinal profile. Ask the same twin a different question next week and the answer will be consistent with its established preferences.
This persistence is what makes digital twins valuable for longitudinal research, iterative concept testing, and any scenario where you need to ask the same audience multiple rounds of questions without the cost and delay of re-fielding a panel.
Digital Twins, Synthetic Personas, and Synthetic Respondents
If you have been researching this space, you have likely encountered multiple terms that sound similar: digital twins, synthetic personas, and synthetic respondents. The terminology can be confusing, but the distinctions are straightforward.
Digital twins and synthetic personas refer to the same concept. Both describe persistent AI models of people that carry an individual's preferences, demographics, and behavioral patterns across multiple research interactions. Whether a vendor calls them digital twins or synthetic personas, the underlying idea is identical: a reusable, queryable profile of a person that can answer new questions on demand. The term "digital twin" comes from engineering and manufacturing, while "synthetic persona" emerged from the AI and market research community. They are interchangeable.
Synthetic respondents, by contrast, are one-shot. A synthetic respondent is a single AI-generated survey response with no persistent identity. It answers one set of questions and then ceases to exist. There is no memory, no continuity, and no ability to follow up. The distinction matters because persistence is what makes digital twins (or synthetic personas) valuable for iterative research, longitudinal tracking, and augmentation of existing panels.
Two Ways to Build Digital Twins
There are two fundamentally different approaches to creating digital twins for research. Both produce persistent, queryable respondent profiles. The difference is where the preference data comes from.
Purely Synthetic Digital Twins
Purely synthetic twins are generated entirely by AI models trained on population-level data. The model learns the joint distribution of demographics, attitudes, and behaviors from large training datasets (government surveys, panel studies, validated research data), then generates individual profiles that are statistically representative of a target population.
These twins do not correspond to any real individual. They are composites, built from learned patterns rather than observed choices. Purely synthetic twins are useful when you have no existing data to start from, when you need to research a population you have never surveyed before, or when speed is the primary constraint.
The trade-off is that purely synthetic twins are only as good as the population-level patterns they were trained on. They capture what is typical for a demographic segment, but they cannot capture the idiosyncratic preferences of a specific person.
Seeded Digital Twins
Seeded twins start from real data. You run a study, whether that is a conjoint, a survey, a panel, or a set of qualitative interviews, and use the observed responses to build an individual-level preference profile for each respondent. That profile becomes the seed for the twin.
The seed can take different forms depending on the study type:
- From conjoint data: Individual-level part-worth utilities estimated via HB-MNL become the preference backbone of the twin. Each twin knows exactly how that specific respondent trades off price vs. features vs. brand, and uses that structure to generate consistent answers to new questions.
- From survey data: The twin is seeded with the respondent's actual answers to the original survey. When new questions are asked, the model generates responses conditioned on the established response patterns, demographics, and attitudes from the seed study.
- From qualitative data: Interview transcripts, open-ended responses, and verbatim comments can be encoded into a twin's profile, giving it a qualitative foundation that shapes how it responds to new prompts.
Seeded twins are more powerful than purely synthetic ones because they carry real, observed preference data. The twin is not guessing what someone like this respondent might think. It is extending what this specific respondent has already told you. This makes seeded twins particularly valuable for augmentation: asking your existing respondents new questions without going back to field.
Why Seeding Matters
The difference between a purely synthetic twin and a seeded twin is the difference between a demographic stereotype and an actual person's preferences.
Consider a conjoint study with 500 respondents. After HB-MNL estimation, you have 500 individual utility vectors. Each one tells you exactly what that person values and how they make tradeoffs. Respondent #247 is a price-sensitive cardiologist who deprioritizes brand. Respondent #391 is a brand-loyal PCP who is willing to pay a premium for a trusted name. These are not segment averages. They are individual preference structures derived from observed choice behavior.
When you seed a digital twin with respondent #247's utility vector, the twin inherits that specific preference structure. Ask the twin a messaging question, and the response will reflect a price-sensitive mindset. Ask it about a new product concept, and it will evaluate the concept through the lens of its established tradeoff behavior.
A purely synthetic twin assigned to the "cardiologist" segment would give you a plausible response, but it would be based on average cardiologist preferences, not on what respondent #247 specifically revealed through their choices. The seeded twin is more precise, more differentiated, and more useful for individual-level analysis.
This is why seeded digital twins are the preferred approach whenever you have existing data to build from. The seed grounds the twin in reality. Everything the twin generates after that is an extension of real, observed behavior.
Digital Twins for Consumer Research
Consumer digital twins model the preferences, purchase drivers, and decision-making patterns of consumer populations. They are built from consumer survey data, brand tracking studies, conjoint results, and behavioral datasets.
Use Cases
- Iterative concept testing: Test 10, 20, or 50 product concepts against the same consumer panel without paying for 50 rounds of fielding. The twins evaluate each concept through their established preference lens, giving you consistent comparative data across all iterations.
- Message optimization: Run messaging studies where each twin reacts to different claims, headlines, or value propositions based on its individual preference profile. Identify which messages resonate with which consumer segments at the individual level.
- Pricing sensitivity: Model how specific consumer segments respond to price changes, and how price interacts with other product attributes for different types of buyers.
- Brand tracking augmentation: Seed twins from your last wave of brand tracking data, then query them between waves to get directional reads on how perception is shifting without waiting for the next fielding window.
- Competitive scenario planning: Simulate how your consumer base would respond to a competitor launch, a price change, or a new product entry, using twins that carry your customers' established preferences.
Simsurveys' consumer model supports both purely synthetic and seeded digital twins for consumer research. Twins can be targeted by any combination of demographic, psychographic, and behavioral criteria across 600+ variables.
Digital Twins for Patient Research
Patient digital twins model the healthcare experiences, treatment attitudes, medication behaviors, and condition-specific outcomes of patient populations. They are particularly valuable in healthcare research where patient recruitment is expensive, slow, and often constrained by ethical and regulatory considerations.
Use Cases
- Treatment satisfaction studies: Understand how patients with specific conditions evaluate their treatment experience, willingness to switch therapies, and barriers to adherence, without the cost and complexity of recruiting real patients.
- Drug awareness and attitude studies: Model how patients respond to new drug launches, assess awareness levels, and test messaging about mechanism of action, side effects, and benefits.
- Rare disease research: For conditions where patient populations are extremely small and hard to recruit, seeded digital twins built from even a small number of real patient interviews can be extended to generate statistically meaningful sample sizes.
- Patient journey mapping: Build twins that carry the full context of a patient's experience, from diagnosis through treatment decisions, to model how interventions at different points in the journey would affect outcomes and satisfaction.
- Health equity research: Generate representative twins for underserved populations that are systematically underrepresented in traditional panel research, using federal health data as the training foundation.
Simsurveys' patient model is trained on 500,000+ de-identified federal health records from NHIS, BRFSS, MEPS, NHANES, CAHPS, and PROMIS. Patient twins can be seeded from existing patient survey data or generated synthetically from this population foundation. All patient data is synthetic and de-identified, with no HIPAA, IRB, or PII concerns.
Digital Twins for HCP Research
HCP digital twins model physician decision-making, prescribing behavior, clinical attitudes, and treatment preferences. Physician research is one of the most expensive forms of market research, with recruitment costs ranging from $150 to $500+ per complete depending on specialty. Digital twins offer a way to get research-grade physician insights without the cost and timeline constraints of traditional HCP panels.
Use Cases
- Message testing for pharma: Test how physicians across different specialties respond to product messaging, clinical claims, and promotional materials. Seeded twins carry each physician's individual prescribing context and clinical priorities.
- Treatment preference studies: Model how physicians evaluate treatment options across efficacy, safety, cost, and convenience attributes, using conjoint-seeded twins that carry individual-level tradeoff data.
- Advisory board simulation: Build a panel of digital twin physicians representing your target specialties and query them on clinical topics, treatment protocols, and unmet needs. Iterate on questions without scheduling another advisory board.
- Pre-launch research: Before a drug launch, build a panel of HCP twins seeded with current prescribing behavior and treatment attitudes. Test launch messaging, competitive positioning, and adoption scenarios against twins that behave like the physicians you are trying to reach.
- ATU and brand tracking: Seed twins from your latest awareness, trial, and usage (ATU) wave. Query them between waves to get interim reads on brand perception shifts, competitive dynamics, and promotional effectiveness.
Simsurveys' HCP model is trained on a database of all licensed U.S. physicians linked to prescription history, covering 15+ specialties. HCP twins can be targeted by specialty, prescribing volume, practice setting, geography, and formulary environment. Validation studies have demonstrated statistical equivalence with live HCP panels across multiple therapeutic areas.
How Seeded Digital Twins Work: Step by Step
The process for building seeded digital twins follows a clear pipeline, regardless of whether the twins represent consumers, patients, or HCPs.
Step 1: Run the seed study. This can be a conjoint, a quantitative survey, a qualitative study, or any research that produces individual-level data. The key requirement is that each respondent needs enough observed data to build a meaningful preference profile.
Step 2: Build individual preference profiles. For conjoint data, this means running HB-MNL estimation to extract individual-level part-worth utilities. For survey data, the profile is built from the respondent's demographics, attitudes, and response patterns. For qualitative data, key themes and positions are encoded into a structured profile.
Step 3: Create the digital twin. Each preference profile is translated into a persistent twin that an AI model can condition on. The twin carries its seed data as a structured context that shapes all future responses. The translation step is critical because AI models reason well over structured semantic profiles but poorly over raw utility matrices or unprocessed survey data.
Step 4: Query the twins. Once the twins exist, you can ask them new questions at any time. Message tests, concept evaluations, follow-up surveys, scenario simulations. Each twin's response is conditioned on its individual profile, producing answers that are attitudinally consistent with its seed data rather than generic population averages.
Validation and Accuracy
The value of digital twins depends entirely on whether their responses are accurate. Simsurveys validates digital twin output using the same framework applied to all synthetic data: head-to-head comparison against real-world benchmark studies.
Validation studies have been conducted across multiple domains:
- HCP research: Synthetic physician responses validated against the AMA Prior Authorization Survey, Commonwealth Fund Primary Care Survey, and specialty-specific physician panels across cardiology, oncology, and primary care.
- Patient research: Synthetic patient data validated against the KFF GLP-1 Drug Poll (n=1,327), US Pain Foundation Survey (n=2,275), and HCAHPS hospital experience data.
- Consumer research: Synthetic consumer responses validated against the IFIC Food and Health Survey, NRF/Happy Returns consumer study, and Walmart Retail Rewired 2025 study.
Across these validations, synthetic responses consistently meet statistical equivalence benchmarks, with KL divergence scores between 0.05 and 0.09 on structured questions. Full validation reports are available for each study on the Simsurveys papers page.
When to Use Purely Synthetic vs. Seeded Twins
Both approaches have their place. The right choice depends on what data you have and what you are trying to accomplish.
Use purely synthetic twins when:
- You have no existing data for your target population
- You are in an early exploratory phase and need directional insights fast
- The research question is broad (general attitudes, category-level preferences)
- Recruitment for real respondents is impractical (rare diseases, niche specialties, hard-to-reach populations)
Use seeded twins when:
- You have existing survey, conjoint, or qualitative data to build from
- You need individual-level precision, not just segment averages
- You are running iterative studies (multiple rounds of concept or message testing)
- You want to augment an existing panel by asking new questions without re-fielding
- You need consistency across multiple studies (the same twins evaluated under different conditions)
In practice, many research programs use both. A purely synthetic study to explore the landscape quickly, followed by a seeded approach built from a smaller live study to get individual-level precision where it matters most.
Getting Started
Digital twins for market research are available now through Simsurveys. You can build purely synthetic twins for any target population using our consumer, patient, or HCP models, or seed twins from your existing survey or conjoint data for individual-level precision.
Whether you are a pharma team running iterative HCP message testing, a brand team optimizing product concepts, or a patient insights group extending a small study into a larger analysis, digital twins let you get more research from less fieldwork.
Reach out for a demo, validation studies, or to discuss how digital twins fit into your research program.