Methodology & honesty

How a Crowdproof study works — and where to doubt it.

Crowdproof produces simulated evidence. This page is the contract for what that evidence is, how it is made, and what it can and cannot tell you. The short version: numbers come with ranges, artifacts come with provenance, and a wide range changes how a decision is executed rather than preventing a decision.

methodology v2.3 · changelog public

Provenance

Every artifact is labeled with what it is.

Nothing in Crowdproof pretends to be a human. Three labels cover everything the product shows you, and they are printed on the artifact itself — not in a footnote.

simulatedsimulated Interviews, funnels, and every derived number. Generated by AI agents; each transcript carries the agent’s ID and model version.

twin-operatedtwin-operated Walkthrough replays and click paths. Recorded while agents operate a scanned twin of the site under test — real interface, simulated operator.

livelive A run in progress. Coral is reserved for running simulations; once a run completes, its outputs revert to the labels above.

The society

A million agents, seeded from the census.

The panel is synthetic, but its shape is not invented. Agents are seeded from US census data (ACS-2024) with occupation, income, and media diet, so the mix of people your launch meets resembles the population it would meet in the world.

seedingEach agent draws demographics from ACS-2024 marginals — age, gender, race/ethnicity, US region, education, occupation, income bracket, and a media-diet profile that governs which simulated channels can reach them.

cohortsYou can target a specific cohort — pick any mix of age, gender, race/ethnicity, US region, income, and education, or let the audience emerge. A selection draws up to 1,000 matching agents (a representative sample if more match — larger cohorts unlock as we scale); the cohort composition is printed on every report. The 1,000-agent size mirrors the representative panel in Stanford's Generative Agent Simulations of 1,000 People.

diffusionLaunch studies propagate through six simulated social channels using an SIR-style social diffusion model (SIR-social v2.3). Reach is a function of the network, not a dial we set.

limitsA census-shaped panel is still a model of people, not people. Tastes, fatigue, and fashion shift faster than census tables; treat segment-level results as directional and check the calibration report for drift.

Product twins

Agents use the real thing, scanned.

Most synthetic research tells agents about your product. Crowdproof scans it into an operable twin — pages, flows, pricing tables, empty states — and lets agents click through it. Twins work on any live site — your own product, a storefront, even a competitor's — and a session measures whatever the question needs, from task completion and friction to first impressions and return intent.

scanningWe build the twin from whatever you submit: a URL, screenshots, a PRD. The twin is versioned; every run records exactly which twin version it executed against.

your approvalNo agent touches a twin before you approve it. If the twin misrepresents your product, the study would measure the wrong thing — so the approval gate is yours, not ours.

replaysWalkthrough sessions are recorded and replayable — every click, every stall, the exact moment an agent leaves. Default depth records 128 sessions per study.

Interviews

Stratified, analyzed, decision-ready.

Synthetic interviews are distributed across the available audience segments so loud cohorts do not drown out quiet ones. The report presents cross-interview themes, tensions, and decision implications instead of burying the user in raw transcripts.

samplingStratified across the materialized audience segments. A study can interview up to 50 representatives, but the count is a maximum rather than a quota: the system stops at the available decision-relevant segments.

transcriptsThe management report contains the interview analysis, not question-by-question answers or raw quotes. Detailed simulated conversations remain available in the interview workspace when a reviewer needs to inspect one persona.

churn reasonsNeeds, barriers, and choice drivers are ranked by segment and translated directly into product, pricing, positioning, and promotion decisions. Post-launch behavior continuously recalibrates those rankings.

Seeds & uncertainty

The range is the result.

Propagation scenarios execute as multiple stochastic cascades. We report the P10–P90 outcome spread rather than hiding uncertainty behind one point estimate.

multi-seedIndependent cascades expose how sensitive an outcome is to timing, network path, and initial conditions. A wider range lowers confidence and changes the rollout guardrails.

reading a band“Retained 18.0% (P10 14.2 – P90 22.4)” is an assumption-bound scenario band. Choose the best-supported option, then plan safeguards against the downside edge rather than treating the midpoint as a promise.

wide rangesA wide range does not end in a non-decision. The report still names the best-supported option, lowers its confidence, identifies the variables driving the spread, and sets the business signals that should trigger adjustment.

retained % · 9 seeds · run #213

10%18%26%

reported: 18.0% (P10 14.2 – P90 22.4)

Calibration

Backtested against launches with known outcomes.

The model earns trust by being checked, not by being confident. We backtest against real launches and campaigns whose outcomes are known, and publish the calibration report quarterly.

the claimDirectional accuracy at funnel scale: the funnel shape, the leak points, and the segment rankings hold up. The claim is never person-level prediction.

the checkEach quarter, completed backtests compare simulated funnels to observed ones. Where the simulation drifts, the report says where and by how much — and the model notes inherit the caveat.

versioningMethodology changes are versioned (you are reading v2.3) with a public changelog. Reports always state the methodology and twin versions they ran under, so old results stay interpretable.

What we don't claim

The honest edges of the instrument.

no person-level predictionNo agent predicts what a specific human will do. The unit of meaning is the funnel and the segment, never the individual.

no guaranteesA favorable simulation is evidence, not a promise of success. Markets contain competitors, timing, and luck that no panel — simulated or human — can hold still.

no hidden uncertaintyWe never trade a range for a cleaner-looking number. If a plan or report ever shows a simulated result without its interval, that is a bug — report it.

no human cosplaySimulated interviews are labeled simulated, always. Honesty is not a paid feature: every plan, including Free, reports full ranges and methodology.

Confidentiality

Your unreleased work stays yours.

Twins, PRDs, and questions run in an isolated environment, are never used to train models, and can be deleted — along with every derived artifact — at any time.

questions about the method · methodology v2.3 · changelog public