What is minimum detectable effect (MDE)?

MDE is the smallest lift your test is powered to detect. Set it to the smallest change that would actually change your decision. If a 3% lift doesn't change what you'd ship, don't power for it — you'll run a longer, more expensive test than needed.

Why 95% confidence and 80% power?

95% / 80% is the operator standard across most experimentation platforms. 95% confidence caps your false-positive rate at 5%; 80% power caps your miss rate at 20%. These are defaults you should only change deliberately.

Two-proportion test: n = (zα + zβ)² × (p1(1−p1) + p2(1−p2)) / (p1 − p2)². For 95%/80%: (1.96 + 0.842)² ≈ 7.85. The denominator shrinks fast for small lifts — a 5% relative lift on a 3% baseline needs tens of thousands per arm.

Does this account for multiple variants?

No — this tool computes sample size for one control + one variant. For A/B/n tests with multiple variants, apply a Bonferroni correction (divide alpha by number of variants) or use sequential testing.

What about non-conversion metrics like revenue?

This calculator is for proportion-based metrics (conversion rate, click-through rate, open rate). For continuous metrics like revenue, use a t-test sample-size calculator instead.

← All apps

A/B Test Sample Size Calculator

Name: Orbit
Availability: InStock
Author: Justin Williames

Work out how many users you need in each arm before you can call a test. Takes baseline rate, minimum detectable effect, confidence, and power. Shows sensitivity across MDE values.

Baseline conversion rate

Your current rate on the control (e.g. 3.5 for 3.5%)

Minimum detectable effect

% relative

Smallest lift worth calling a win. 10% relative on a 3.5% baseline = detect 3.85%.

Confidence

Statistical power

Daily send volume

sends / day

How many recipients you send to per day. Used to project test duration.

Required sample size

45,373

per arm · 90,746 total (control + 1 variant)

10days at 10,000 sends/day

Detect lift from 3.50% → 3.85%

Two-proportion z-test, 95% confidence, 80% power. A run of this size will reliably flag a 10% relative lift (the operator standard).

Sensitivity — how N changes with MDE

5% relative lift177,340 per arm

10% relative lift45,373 per arm

15% relative lift20,625 per arm

20% relative lift11,859 per arm

30% relative lift5,498 per arm

Smaller lifts need much larger samples. If you want to detect a 5% relative lift on a low baseline, you'll need months of volume.

Go deeper

Guides on this topic

The long-form guides that explain the thinking behind the tool. Written for operators who want to know not just what to do, but why.

About the A/B Test Sample Size Calculator

Free sample-size calculator for two-proportion A/B tests. Enter baseline conversion rate, minimum detectable effect, confidence, and power — get required sample per arm, total sample, projected test duration at your daily volume, and a sensitivity table across MDE values. Uses the standard (zα + zβ)² formula. Pure client-side.

How to use

Enter your baseline conversion rate (your current control rate).
Set the minimum detectable effect as a relative lift — smallest change worth calling a win.
Pick confidence (95% is standard) and power (80% is standard).
Add your daily send volume to project duration.
Read the per-arm sample size, total, and projected days.

Common use cases

Planning an email or push A/B test before launch
Working out whether a proposed test is feasible given weekly volume
Scoping holdout experiments for retention programs
Setting expectations with stakeholders on how long a test will run

Who it's for

Lifecycle, growth, and experimentation leads who want to plan tests realistically instead of launching them and hoping for significance.

Frequently asked questions

What is minimum detectable effect (MDE)?: MDE is the smallest lift your test is powered to detect. Set it to the smallest change that would actually change your decision. If a 3% lift doesn't change what you'd ship, don't power for it — you'll run a longer, more expensive test than needed.
Why 95% confidence and 80% power?: 95% / 80% is the operator standard across most experimentation platforms. 95% confidence caps your false-positive rate at 5%; 80% power caps your miss rate at 20%. These are defaults you should only change deliberately.
What's the formula?: Two-proportion test: n = (zα + zβ)² × (p1(1−p1) + p2(1−p2)) / (p1 − p2)². For 95%/80%: (1.96 + 0.842)² ≈ 7.85. The denominator shrinks fast for small lifts — a 5% relative lift on a 3% baseline needs tens of thousands per arm.
Does this account for multiple variants?: No — this tool computes sample size for one control + one variant. For A/B/n tests with multiple variants, apply a Bonferroni correction (divide alpha by number of variants) or use sequential testing.
What about non-conversion metrics like revenue?: This calculator is for proportion-based metrics (conversion rate, click-through rate, open rate). For continuous metrics like revenue, use a t-test sample-size calculator instead.

Related Orbit tools

Using Claude?

Orbit plans + runs experiments end-to-end in Claude.

Inside Orbit for Claude, the Experiment Design skill scopes the test (sample size, duration, guardrails), the Braze integration builds the segments and campaigns, and the Significance Calculator calls the winner. No spreadsheet. No copy-paste between tools. The whole cycle lives in one conversation. Free for everyone — the Claude extension is the power-user upgrade, not a gated feature.

Orbit web apps