Orbit web apps
Orbit web apps
Work out how many users you need in each arm before you can call a test. Takes baseline rate, minimum detectable effect, confidence, and power. Shows sensitivity across MDE values.
Your current rate on the control (e.g. 3.5 for 3.5%)
Smallest lift worth calling a win. 10% relative on a 3.5% baseline = detect 3.85%.
How many recipients you send to per day. Used to project test duration.
Required sample size
45,373
per arm · 90,746 total (control + 1 variant)
Detect lift from 3.50% → 3.85%
Two-proportion z-test, 95% confidence, 80% power. A run of this size will reliably flag a 10% relative lift (the operator standard).
Sensitivity — how N changes with MDE
Smaller lifts need much larger samples. If you want to detect a 5% relative lift on a low baseline, you'll need months of volume.
Go deeper
The long-form guides that explain the thinking behind the tool. Written for operators who want to know not just what to do, but why.
experimentation · 10 min read
A/B testing in email: sample size, novelty, and what to report
Most email A/B tests produce winners that don't reproduce. Three reasons keep showing up: under-powered samples, the novelty effect, and weak readout discipline. This guide is about designing tests that actually drive decisions instead of theatre.
experimentation · 8 min read
Sample size: the calculation everyone gets wrong in email A/B tests
Most email A/B tests are powered to detect effects far larger than the test could actually produce. The result: false positives and false nulls, with confident conclusions in both directions. Sample size calculation fixes this before you send. Takes 5 minutes. Here's the 5-minute version.
experimentation · 9 min read
Holdout group design: the incrementality tool most lifecycle programs skip
Without a holdout, lifecycle ROI is attribution-model guesswork with a spreadsheet. With one, you get a defensible number you can actually put in front of finance. Here's how to size, run, and read a holdout — and the three mistakes that quietly invalidate the result.
experimentation · 9 min read
Incrementality testing: the measurement that tells you if a program actually works
Last-click attribution makes lifecycle look bigger than it is. Incrementality testing strips out users who would have converted anyway and surfaces the real number. This is how to design a test that produces a figure you can defend in front of a CFO.
Free sample-size calculator for two-proportion A/B tests. Enter baseline conversion rate, minimum detectable effect, confidence, and power — get required sample per arm, total sample, projected test duration at your daily volume, and a sensitivity table across MDE values. Uses the standard (zα + zβ)² formula. Pure client-side.
Lifecycle, growth, and experimentation leads who want to plan tests realistically instead of launching them and hoping for significance.
Using Claude?
Inside Orbit for Claude, the Experiment Design skill scopes the test (sample size, duration, guardrails), the Braze integration builds the segments and campaigns, and the Significance Calculator calls the winner. No spreadsheet. No copy-paste between tools. The whole cycle lives in one conversation. Free for everyone — the Claude extension is the power-user upgrade, not a gated feature.