Why Sarcopenia?
Sarcopenia — the progressive loss of skeletal muscle mass and function associated with aging — is an increasingly important clinical topic. It affects an estimated 10–16% of older adults worldwide, contributes to falls, fractures, and loss of independence, and is gaining attention as a treatment target for pharmaceutical and nutrition companies. Yet physician awareness and screening practices remain inconsistent, making it a challenging but commercially important area for healthcare market research.
For Simsurveys, sarcopenia represents an ideal stress test for the Healthcare model. Unlike broad topics such as practice satisfaction or ACA attitudes, sarcopenia requires the model to demonstrate knowledge of a specific clinical domain — including screening tools, diagnostic criteria, prevalence estimates, and treatment recommendations. This validation asks whether synthetic physician respondents can replicate the knowledge, attitudes, and clinical behaviors of real physicians on a niche therapeutic area.
Study Design
We compared Simsurveys output against a live Physician Sarcopenia Survey fielded in December 2025. The live sample included n=253 physicians in primary care/general practice and internal medicine. We generated n=1,000 synthetic physician respondents using the Simsurveys Healthcare model.
The survey covered sarcopenia familiarity, prevalence estimates, terminology preferences, screening practices, diagnostic tools, and treatment recommendations. For single-select questions, we measured alignment using KL Divergence. For multi-select questions — where respondents select all options that apply — we used Rank-Biased Overlap (RBO) with a persistence parameter of p=0.9, which measures how well the model reproduces the rank ordering of selected options.
Results: Strong on Practice Patterns, Weaker on Niche Facts
The model showed a clear performance split between questions about clinical practice patterns and questions about specialized factual knowledge. On practice-oriented questions, the results were strong. Term familiarity achieved a KL Divergence of 0.044, and screening time scored 0.080. Patient demographic questions performed exceptionally well, with KL values ranging from 0.009 to 0.027. Where patients live — a question about the care settings where sarcopenia patients are encountered — scored 0.027.
The multi-select questions showed particularly impressive results. Life events prompting screening achieved an RBO of 0.981 — meaning the model's rank ordering of triggering events was nearly identical to what live physicians reported. Treatment recommendations scored an RBO of 0.900. Primary motivations for screening: 0.882. And patient failure reasons — the reasons patients fail to follow through on sarcopenia treatment — achieved a perfect RBO of 1.000, with the simulated physicians ranking every failure reason in exactly the same order as the live panel.
Perfect rank ordering on patient failure reasons (RBO 1.000). Near-perfect on life events prompting screening (RBO 0.981) and treatment recommendations (RBO 0.900). The model captures how physicians think about sarcopenia clinical workflows.
Where the Model Struggled
The weaker results appeared consistently on questions requiring niche factual knowledge rather than clinical judgment. Prevalence estimates showed a KL Divergence of 0.273 — the model did not accurately reproduce how physicians estimate sarcopenia prevalence in their patient populations. Terminology preference scored 0.242, suggesting the model was uncertain about which clinical terms physicians actually use for the condition.
Initial concern expression — how physicians first raise the topic of muscle loss with patients — showed a KL Divergence of 0.297. Geriatric training status was the weakest question at 0.662, indicating the model could not accurately predict which physicians had completed specialized geriatric training. Follow-up frequency scored 0.208.
The pattern is consistent and informative: the Healthcare model excels on questions about clinical practice patterns, decision-making workflows, and treatment approaches. It struggles with precise factual estimates (prevalence rates), personal training history (geriatric certification), and specific communication behaviors (how doctors phrase their initial concerns). The former reflects general medical knowledge and clinical reasoning; the latter requires specialized, population-specific data that the model has not been trained on.
Multi-Select vs. Single-Select Performance
An interesting methodological finding from this validation is the difference in performance between single-select and multi-select question formats. Multi-select questions — measured by RBO — showed consistently stronger results than single-select questions measured by KL Divergence. This suggests that the model is better at capturing the relative importance of different clinical factors (which screening triggers matter most, which treatments are recommended first) than at reproducing exact distributional shapes on single-choice items.
For healthcare market research teams, this has practical implications for survey design. When using synthetic HCP data for therapeutic area research, multi-select questions about clinical priorities, treatment hierarchies, and decision factors are likely to produce more accurate results than single-select questions about precise prevalence estimates or training backgrounds.
Implications for Therapeutic Area Research
These results establish both the strengths and the boundaries of synthetic physician data for niche therapeutic areas. The Healthcare model is strong enough to provide reliable directional insights on physician attitudes, clinical workflows, treatment preferences, and screening behaviors. For research questions that require precise prevalence estimates or specialized training demographics, live panel data remains the better option.
The full validation report, including question-level metrics and distribution comparisons, is available for download. For more on the Healthcare model, visit the Healthcare model page.