Updated · 7 min read
Send-time optimisation: what it really moves, and what it doesn't
Picture a vendor demo for send-time optimisation — STO, the ESP feature that picks the perfect hour to email each individual user. The deck says forty percent open-rate lift. Your VP forwards it. Two weeks later you've paid for a premium tier and your numbers haven't moved. The deck wasn't lying exactly; it was measuring against a control that flattered the feature. Real lift is three to eight percent, lands on opens not revenue, and only on certain kinds of programs. Here's what STO actually does, when it earns its keep, and when it's theatre.

By Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
The pitch, and the gap between the pitch and the thing
Imagine the inbox the way most people use it. You glance at your phone over coffee, again at lunch, maybe once after dinner. An email that arrived at 9am and one that arrived at 11am are read at the same moment — whenever you next pick up the phone. Send-time optimisation (STO — the ESP feature that picks an individual delivery hour for each subscriber based on their past open and click behaviour) is built on the bet that if it can land the email just before one of those glances, the email wins. Sometimes it does. Often it doesn't.
Different ESPs (email service providers — Braze, Iterable, Klaviyo, Mailchimp, the lot) run different models behind STO. Some look at nothing more sophisticated than the most recent hour you opened. Some bucket users by time-of-day and day-of-week. A handful run actual machine-learning models — algorithms trained on your engagement history — with extra features. The marketing slide is always more sophisticated than the model underneath.
STO changes the time the email lands. It doesn't change the email, the audience, or the offer. The ceiling is bounded entirely by how much delivery hour affects engagement — and for most users, the honest answer is: not much.
STO earns its money on users with predictable, concentrated engagement windows — the commuter who only checks at 7:45am, the night-shift worker who reads at 11pm. For everyone else, the algorithm is picking between two equivalent rooms in an empty hotel.
Why the vendor numbers and the real numbers don't match
Vendor case studies show twenty to forty percent open-rate lift. Independent benchmarks — Mailchimp's own data, Litmus reports, academic studies — land on three to eight percent open rate, one to four percent click rate, and typically no significant revenue lift against a proper holdout (the random group you don't apply the feature to, the way a clinical trial works). Five-to-ten times the gap between deck and reality. Three reasons why.
Apple MPP is poisoning the data. Mail Privacy Protection — the iOS feature, on by default since 2021, that pre-fetches every email the moment it arrives — fires an "open" before the user has so much as looked at their phone. STO can't tell the difference. The algorithm "learns" that the user engages at the pre-fetch hour, which has nothing to do with when the user actually engages, and the lift the vendor reports is often this inflation compounding with itself. The user could be asleep. Still counts as an open. Lovely system.
Confounded comparisons. Many case studies compare STO sends to a control group sent at a different fixed time, not to a random-time holdout drawn from the same population. That's selection bias dressed up in a PDF — you're comparing two different audiences, not two different strategies.
Small effect, noisy metric. Open rates jump around a lot send-to-send. A real five-percent lift sits comfortably inside the natural variance of a single broadcast — you can't see the signal without enough sends to average out the noise.
,
When STO is genuinely worth the money
Three situations where it pays back, plainly.
Global audiences across time zones. Users in Sydney opening at 9am local and users in New York opening at 9am local need different send times — sixteen hours apart, in fact. STO (or even simple time-zone-aware sending) stops one cohort waking up to a 3am ping. Always worth doing.
Broadcast campaigns with no urgency. Newsletters, content drops, non-promotional broadcasts where it doesn't matter if user A reads at 9am and user B reads at 4pm. STO delivers modest lift, no real downside, low cost to leave on.
Large, diverse audiences. Above 500k MAU (monthly active users — the standard scale benchmark for consumer programs), individual send-time differences start to aggregate into something measurable, and the algorithm has enough data per user that its predictions are non-random. STO scales with data availability; under that threshold, the engine is mostly guessing.
When STO is the wrong lever — or actively harmful
Time-sensitive sends. A flash sale ending in four hours cannot wait for each user's preferred hour. Send now. STO is the wrong tool — it would happily delay your urgency promotion until tomorrow morning if that's when the user usually engages.
Triggered sends. Welcome emails, order confirmations, password resets — the user's action is the trigger. STO would delay these, which is exactly the opposite of what the user wants. Running STO on a password reset is one of those decisions you can tell was made without anyone reading the spec.
New users with no history. STO needs past behaviour to optimise on. A user who signed up yesterday has none, so STO falls back to the category average — roughly equivalent to picking a reasonable time manually. Many programs unthinkingly apply STO to welcome emails and produce worse performance than a fixed send would have done.
Small audiences under 50k. Most users have too few data points for per-user optimisation to mean anything, the algorithm falls back to the category average, and you've paid for a premium feature to get the default answer.
The unsexy alternative that captures most of the lift
For most programs, time-zone-aware sending — every recipient hit at their local 10am, no per-user modelling — captures roughly eighty percent of STO's value with none of the complexity. No premium tier. No dependency on machine-open-inflated data. Native in every modern ESP.
The lift versus "everyone at 10am UTC" is usually in the same neighbourhood as what STO achieves against the same baseline. Which means most of the value STO claims is actually time-zone handling wearing a fancy jacket and charging you for the privilege.
The A/B testing playbook covers how to validate STO versus time-zone sending versus fixed-time sending for your specific audience — the rigour underneath the recommendation.
What to say when your ESP pitches STO as a premium upgrade
Most ESPs sell STO as a paid tier. The pitch leans on vendor case studies with the methodology problems described above. Four moves before you sign anything.
1. Ask for independent validation, not vendor case studies.
2. Ask how they measure lift — against what control group? If they say "a different fixed-time send," that's the confounded comparison from earlier; the answer you want is "a random-time holdout from the same population."
3. Run the thirty-day holdout test on your own data before deciding.
4. Compare measured lift against simple time-zone-aware sending — which is free in most ESPs.
The usual conclusion: time-zone-aware captures most of the lift, and STO's marginal addition doesn't justify the premium. For some programs — large, global, broadcast-heavy — the premium is worth it. The discipline is measuring before deciding, not buying the feature because it sounds clever in a meeting.
One side note on testing send times generally: don't bother with 9am versus 10am. Too similar. Test 10am versus 6pm, or weekday versus weekend — windows that are genuinely different. Once you've found the right rough window, fine-tuning inside it rarely moves anything real.
covers the holdout methodology for validating vendor-marketed features. STO is one of several where the claimed lift diverges meaningfully from the measured effect. There will be more — this is the industry we picked.
Read to the end
Scroll to the bottom of the guide — we'll tick it on your reading path automatically.
This guide is backed by an Orbit skill
Related guides
Browse allA/B testing in email: sample size, novelty, and what to report
Most email A/B tests produce winners that don't reproduce. Three reasons keep showing up: under-powered samples, the novelty effect, and weak readout discipline. This guide is about designing tests that actually drive decisions instead of theatre.
Price-testing through email: what's testable, what isn't
Email is the fastest place to try a new price, and the easiest place to learn the wrong lesson. What you can test cleanly, what you can't, and the measurement traps that quietly turn price tests into expensive false positives.
Sample size: the calculation everyone gets wrong in email A/B tests
Most email A/B tests are powered to detect effects far larger than the test could actually produce. The result: false positives and false nulls, with confident conclusions in both directions. Sample size calculation fixes this before you send. Takes 5 minutes. Here's the 5-minute version.
False positives in email A/B tests: why half of winning tests don't actually win
Run enough A/B tests and some will show 'significant' lift from pure noise. Programs that ship every significant winner end up with a collection of imaginary improvements they can't tell apart from real ones. Here's how to spot the fakes and avoid the trap.
Holdout group design: the incrementality tool most lifecycle programs skip
Without a holdout, lifecycle ROI is attribution-model guesswork with a spreadsheet. With one, you get a defensible number you can actually put in front of finance. Here's how to size, run, and read a holdout — and the three mistakes that quietly invalidate the result.
Incrementality testing: the measurement that tells you if a program actually works
Last-click attribution makes lifecycle look bigger than it is. Incrementality testing strips out users who would have converted anyway and surfaces the real number. This is how to design a test that produces a figure you can defend in front of a CFO.
Found this useful? Share it with your team.
Use this in Claude
Run this methodology inside your Claude sessions.
Orbit turns every guide on this site into an executable Claude skill — 63 lifecycle methodologies, 91 MCP tools, native Braze integration. Free for everyone.