Background
Chronic pain is one of the most studied — and most difficult to study — conditions in patient research. Patients are hard to recruit, tend to self-select into advocacy communities, and report experiences that vary enormously based on provider relationships, treatment history, and comorbid mental health conditions. Any synthetic model that claims to represent chronic pain patients needs to prove it can handle this complexity.
To test that claim, we validated the Simsurveys Patient model against the US Pain Foundation's "A Chronic Pain Crisis" survey, conducted between March and April 2022. The original study collected responses from 2,275 chronic pain patients recruited through the Pain Foundation's community network — a convenience sample of patients who are actively engaged with pain advocacy and support resources.
Study Design
We generated a matched synthetic sample of n=2,275 respondents, applying the same custom demographic quotas used in the original study: 87% female and 13% male, with ages concentrated in the 45–64 range (62% of the sample), a suburban majority (52%), and a notable rural representation of 26%. These quotas reflect the demographics of the Pain Foundation's community rather than the general chronic pain population, making the comparison a test of the model's ability to adapt to a specific population profile.
The survey instrument covered 17 questions organized into three thematic clusters: Mental Health & Stigma (4 questions), Patient-Provider Relationship (7 questions), and Treatment Experience (6 questions). Question types ranged from binary yes/no items to multi-point ordinal scales and multi-select lists.
Overall Results
Across all 17 questions, the average KL divergence was 0.029 with a median of 0.011. Every question achieved a "Good" rating, defined as KL divergence below 0.15. This is the strongest overall validation result we have published for the Patient model to date.
All 17 questions rated "Good" (<0.15 KL divergence). Average KL: 0.029. Median KL: 0.011. The model reproduced chronic pain patient response distributions with high fidelity across all three thematic clusters.
Where the Model Excelled
The strongest results came from binary and categorical questions about stigma and treatment experience. Stigma by providers scored a KL divergence of 0.000 — the synthetic and live distributions were effectively identical. Side effects preventing treatment (0.000), prescription medication use (0.004), opioid use (0.007), and pain contracts (0.003) all produced near-perfect distributional matches.
Provider relationship questions also performed well. Comfort discussing treatments scored 0.007, feeling listened to by providers scored 0.016, and team approach to care scored 0.011. These are ordinal scale items where models typically struggle with central tendency bias, but the Patient model maintained accurate spread across response categories.
Where the Model Was Weaker
The two weakest items were both in the Mental Health & Stigma cluster: anxiety symptoms (KL 0.105) and depression symptoms (KL 0.128). In both cases, the model over-estimated the frequency of reported symptoms compared to the live sample. Both scores still fall well within the "Good" threshold, but they represent the largest divergences in the study.
This pattern is consistent with what we observe across patient validation studies. The model tends to slightly over-estimate mental health symptom frequency — likely because federal health data sources such as NHIS and BRFSS capture higher base rates of anxiety and depression among chronic condition populations than a convenience sample of engaged advocacy community members might report. Additionally, ordinal provider-relationship scales showed mild central tendency compression, though not enough to push any item out of the "Good" range.
Interpreting the Results
The US Pain Foundation survey represents a specific and self-selected population: patients who are connected to a national advocacy organization and motivated enough to complete a survey about their care experience. This is not a probability sample of all chronic pain patients. The fact that the synthetic model reproduced these distributions with an average KL of 0.029 suggests that it can adapt to targeted population profiles, not just general patient populations.
The pattern of results — strong on stigma and treatment items, slight over-estimation on mental health frequency, mild central tendency on ordinal scales — is informative for researchers planning future studies. Binary and categorical treatment questions can be used with high confidence. Mental health frequency items should be interpreted with awareness that synthetic estimates may skew slightly high relative to engaged patient communities.
Access the Full Report
The complete validation report, including question-level distribution tables, KL divergence scores, and cluster-level analysis, is available for download as a PDF. For more information about the Patient model, visit the Patient model page.