Methodology & honesty

How a Crowdproof study works — and where to doubt it.

Crowdproof produces simulated evidence. This page is the contract for what that evidence is, how it is made, and what it can and cannot tell you. The short version: numbers come with ranges, artifacts come with provenance, and a wide range means we don’t know yet.

methodology v2.3 · changelog public
Provenance

Every artifact is labeled with what it is.

Nothing in Crowdproof pretends to be a human. Three labels cover everything the product shows you, and they are printed on the artifact itself — not in a footnote.

simulatedsimulated Interviews, funnels, and every derived number. Generated by AI agents; each transcript carries the agent’s ID and model version.
twin-operatedtwin-operated Walkthrough replays and click paths. Recorded while agents operate a scanned twin of the site under test — real interface, simulated operator.
livelive A run in progress. Coral is reserved for running simulations; once a run completes, its outputs revert to the labels above.
The society

A million agents, seeded from the census.

The panel is synthetic, but its shape is not invented. Agents are seeded from US census data (ACS-2024) with occupation, income, and media diet, so the mix of people your launch meets resembles the population it would meet in the world.

seedingEach agent draws demographics from ACS-2024 marginals — occupation, income bracket, region, and a media-diet profile that governs which simulated channels can reach them.
cohortsThe cohort composition for every run is printed on its report. If a study targets a segment (e.g. North American PMs), the seeding is re-weighted and the report says so.
diffusionLaunch studies propagate through six simulated social channels using an SIR-style social diffusion model (SIR-social v2.3). Reach is a function of the network, not a dial we set.
limitsA census-shaped panel is still a model of people, not people. Tastes, fatigue, and fashion shift faster than census tables; treat segment-level results as directional and check the calibration report for drift.
Product twins

Agents use the real thing, scanned.

Most synthetic research tells agents about your product. Crowdproof scans it into an operable twin — pages, flows, pricing tables, empty states — and lets agents click through it. Twins work on any live site — your own product, a storefront, even a competitor's — and a session measures whatever the question needs, from task completion and friction to first impressions and return intent.

scanningWe build the twin from whatever you submit: a URL, screenshots, a PRD. The twin is versioned; every run records exactly which twin version it executed against.
your approvalNo agent touches a twin before you approve it. If the twin misrepresents your product, the study would measure the wrong thing — so the approval gate is yours, not ours.
replaysWalkthrough sessions are recorded and replayable — every click, every stall, the exact moment an agent leaves. Default depth records 128 sessions per study.
Interviews

Stratified, transcribed, attached in full.

After agents act, we ask them why. Interviews are sampled stratified across segments so loud cohorts don't drown out quiet ones — and the transcripts ship with the report, never summarized away.

samplingStratified across behavioral segments (e.g. churned at pricing, retained past first session). Default depth interviews n = 128 per study.
transcriptsFull transcripts are attached to every report, each labeled simulated with the agent's ID and model version. Quotes in summaries link back to their transcript.
churn reasonsReasons are ranked by segment from the interview corpus, and each funnel drop annotates its top-ranked reason. Rankings are claims about the simulation — verify the ones you act on.
Seeds & uncertainty

The range is the result.

Every study executes across multiple seeds — independent re-runs of the same question. We report the P10–P90 spread across seeds, not a point estimate, and confidence intervals within a run come from bootstrap resampling (10K resamples).

multi-seedThe default full study runs 9 seeds. If the seeds disagree, the range is wide — that's the instrument telling you the answer is unstable, not a rendering choice.
reading a band“Retained 18.0% (P10 14.2 – P90 22.4)” means: across re-runs, 80% of outcomes landed inside that band. Plan against the edges you can't afford, not the midpoint.
wide rangesA wide range means 'not enough signal', and the report says exactly that — along with what would narrow it: more seeds, a deeper pipeline stage, or a tighter cohort. We'd rather you wait than decide on noise.
retained % · 9 seeds · run #213
10%18%26%

reported: 18.0% (P10 14.2 – P90 22.4)

Calibration

Backtested against launches with known outcomes.

The model earns trust by being checked, not by being confident. We backtest against real launches and campaigns whose outcomes are known, and publish the calibration report quarterly.

the claimDirectional accuracy at funnel scale: the funnel shape, the leak points, and the segment rankings hold up. The claim is never person-level prediction.
the checkEach quarter, completed backtests compare simulated funnels to observed ones. Where the simulation drifts, the report says where and by how much — and the model notes inherit the caveat.
versioningMethodology changes are versioned (you are reading v2.3) with a public changelog. Reports always state the methodology and twin versions they ran under, so old results stay interpretable.
What we don't claim

The honest edges of the instrument.

no person-level predictionNo agent predicts what a specific human will do. The unit of meaning is the funnel and the segment, never the individual.
no guaranteesA favorable simulation is evidence, not a promise of success. Markets contain competitors, timing, and luck that no panel — simulated or human — can hold still.
no hidden uncertaintyWe never trade a range for a cleaner-looking number. If a plan or report ever shows a simulated result without its interval, that is a bug — report it.
no human cosplaySimulated interviews are labeled simulated, always. Honesty is not a paid feature: every plan, including Free, reports full ranges and methodology.
Confidentiality

Your unreleased work stays yours.

Twins, PRDs, and questions run in an isolated environment, are never used to train models, and can be deleted — along with every derived artifact — at any time.

questions about the method · methodology v2.3 · changelog public