Updated · 8 min read
Inbox placement testing: seed lists, their limits, and what to do instead
Picture the dashboard most lifecycle teams check on a Monday morning: a clean number that says '82% inbox, 18% spam' next to a green tick. The exec asks if deliverability is healthy, someone screenshots the green number, everyone moves on. The trouble is that number is measuring 100 test addresses Validity owns — not the 4 million real subscribers your program actually sends to. It's a thermometer dipped in a bucket next to the swimming pool. Useful for spotting a fire. Useless for telling you the water temperature where anyone's swimming.

By Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
The dashboard everyone trusts and shouldn't
A seed-list tool — Validity Everest, Litmus Email Guardian, GlockApps, MailGenius — owns a set of test mailboxes (the "seed list") at the major inbox providers: Gmail, Outlook, Yahoo, Comcast. You add those addresses to your real send. After the campaign goes out, the tool checks where each test message landed: inbox, spam folder, Promotions tab (Gmail's second-class mailbox), or vanished entirely. It aggregates the results and reports a number.
The output looks authoritative. "Gmail: 78% inbox, 12% Promotions, 10% spam." Sum across providers, get an overall "inbox placement rate," and that number ends up in the deck for the QBR — quarterly business review, the meeting where the marketing leadership tells the exec team what's working. From there it gets quoted as "our deliverability," which is the moment the metric stops meaning what it meant.
The seed list is 100–200 test addresses. Your real audience is millions of real people with years of personal engagement history. A seed list can tell you if there's a gross delivery problem. It cannot tell you what individual users see.
Why the green number lies (politely)
Four reasons the seed-list number isn't the deliverability rate you think it is. None of them are the tool's fault — they're structural to how seed testing works.
Seed addresses have no relationship with you. Personal Gmail filtering — how Gmail decides inbox vs. Promotions vs. spam for a specific user — leans heavily on engagement history. Has this person opened your mail before? Replied? Marked it "not spam"? Seed accounts have none of that. They reflect what a stranger sees, not what your engaged subscribers experience. Your real audience generally gets better placement than the seed list suggests, which means the seed number is closer to a lower bound than a true measurement.
"Gmail" in seed results is three addresses, not Gmail. Real Gmail filtering varies by user, by recent behaviour, by location, by which Gmail data centre routes the message. A handful of test addresses won't capture any of that variation. You're measuring three points and calling it a population.
Sample size — and the noise it brings. A seed list of 50 addresses has a margin of error of roughly ±14% on a reported 80% placement rate. Which means "78% inbox this week, 82% next week" is statistical noise — the random wobble you get from any small sample, the same reason a 10-person poll can't predict an election. It's not a real change. It's definitely not a metric worth steering the program by.
Tests are easy to game, even by accident. Sending only to seed addresses, from a clean IP, with clean content, produces better numbers than your production send. If the test isn't inside a normal campaign, you're measuring best-case placement and reporting it as real placement. Useful only if you're hunting a reassuring number rather than a true one.
So is the inbox placement rate accurate? Directionally yes, literally no. Trend matters more than the absolute figure. The seed number is a smoke alarm, not a thermostat.
What seed lists are genuinely good for
Worth being clear: dismissing seed-list reports entirely would be wrong. They earn their place. They just don't earn the place programs usually give them.
Catching a fire fast. Normal placement at 80%+ that suddenly drops to 40%. Something broke — authentication, sending reputation, content. The seed list catches this faster than your real engagement data, because engagement metrics lag the failure by days while the seed test catches it inside an hour.
Provider-specific diagnosis. Seed shows 85% inbox at Gmail, 30% at Yahoo. You have a Yahoo-specific problem — almost certainly authentication or list hygiene affecting Yahoo's spam filter specifically. The gap between providers is more informative than either provider's number on its own.
A/B comparison between variants. Two campaign variants, both run through the same seed list. Variant A shows 80% placement, variant B shows 60%. A is better — within the usual noise caveat. Internal comparison cancels out a lot of the things that make absolute numbers untrustworthy.
Pre-send sanity check. Before a big send, a seed test catches obvious problems — a broken DKIM signature, a typo in the from-name, a tracking domain that's tanked your reputation overnight — before you hit the full list. Worth it even when the headline number doesn't mean what people think.
,
The signals that actually tell you what's happening
Seed-list is one input among several. Here's the ranking that actually reflects how much each one tells you, ordered by how close it sits to what your real audience experiences.
1. Actual engagement data. Open rate, click rate, revenue per send across your real audience — measured directly from your ESP (email service provider, the platform you send from: Braze, Iterable, Klaviyo and the like). Real users, real mail. Open rate is noisy now because Apple Mail Privacy Protection — Apple's 2021 feature that pre-fetches every email so opens fire whether the user reads or not — broke it for the half of your list on Apple devices. Click and revenue per send are still solid. When these are healthy, deliverability is working regardless of what the seed list says.
2. Google Postmaster domain reputation. Gmail's own four-tier rating of your sending domain (Bad / Low / Medium / High), free, straight from the source. Much more trustworthy than any third-party measurement for Gmail specifically — Google is telling you what Google thinks. Walkthrough in the Postmaster guide.
3. Spam complaint rate. The percentage of recipients clicking "Report spam," pulled from your ESP's feedback-loop data — the back-channel inbox providers use to tell senders about complaints. Leading indicator: if this is climbing, the seed number will follow in a few weeks. Above 0.3% is a problem. Above 0.5% and Gmail will throttle you.
4. Seed-list inbox placement. Useful as secondary confirmation and for provider-specific diagnosis. Not the number you lead with. Not the number you put in front of an exec.
What to do when seed-list says 60% inbox but engagement metrics are fine? Trust the engagement metrics. They're measuring what your real audience actually experiences. A seed-list 60% on a program with healthy engagement usually means the seed lists skew toward a harsher filtering environment than your real audience's average. Monitor. Don't panic.
When seed-list is worth paying for
Seed-list tools are useful but not essential. The right call depends on program size and deliverability stakes, with monthly send volume as the cleanest cut:
Under 500K monthly sends: probably skip. Free tools (Postmaster Tools, Mail Tester) plus your real engagement metrics do the job. The $200–$1000/month seed-list cost isn't justified at this scale — you don't have enough volume for small placement deltas to matter financially.
500K–10M monthly: valuable for provider-specific diagnosis and pre-send checks on big campaigns. A mid-tier Validity or GlockApps subscription is reasonable here.
10M+ monthly: probably essential. At this scale small placement improvements translate into meaningful revenue, and the real-time monitoring earns the premium tier.
Which tool? Validity Everest and GlockApps are the market leaders. Litmus has a smaller but solid offering. For most programs they're roughly equivalent in capability — choose on pricing, UI, and how well each one plays with the rest of your stack. Running two in parallel doesn't meaningfully increase signal. It just doubles the cost and gives you two numbers to argue about.
Before a big campaign, the three-step playbook actually worth running: send to your seed list plus a small internal list, check rendering and placement; release a 10% sample of your real audience, wait four hours, check engagement for anomalies; release the remaining 90% once the sample looks clean. Catches most production issues before full-scale send, and the seed number is one input of several rather than the whole story.
The Deliverability Management skill uses seed-list as one of several inputs, never in isolation. The most informative view combines real engagement data, Google Postmaster Tools, and seed-list results — each catches failure modes the others miss, and none of them is reliable enough to stand alone.
Read to the end
Scroll to the bottom of the guide — we'll tick it on your reading path automatically.
This guide is backed by an Orbit skill
Related guides
Browse allList hygiene: the six-rule policy
List hygiene isn't cleanup; it's a continuous policy that runs automatically. Here's the six-rule policy every lifecycle program should have written down, each tied to a specific deliverability outcome.
Apple Mail Privacy Protection, four years in
Apple broke the open rate in 2021. Half the lifecycle industry is still pretending it didn't happen. Four years on, the programs that actually adapted are beating the ones that kept optimising a metric that doesn't exist anymore.
Google Postmaster Tools: a walkthrough for people who actually send email
Postmaster Tools is the single most valuable free deliverability tool and most programs either ignore it or misread the charts. Here's what each tab actually says, what to act on, and what to stop looking at.
The deliverability mental model: one picture for authentication, reputation, content, and monitoring
Most deliverability guides cover one piece — SPF, DKIM, DMARC, BIMI, reputation, warmup — and assume you already know how the pieces fit. This is the picture they assume: how a mailbox provider decides whether your email reaches the inbox, what each acronym actually does inside that decision, and where to look first when placement tanks.
Email deliverability — the practitioner's guide
Deliverability isn't a setting. It's the running total of every send decision you've made since you bought the domain. Four pillars hold it up. Break one and the whole program starts leaking.
IP warm-up in Braze — the playbook that actually holds
A fresh dedicated IP has zero reputation on day one. Most warm-up guides fixate on ramp speed and ignore the harder question — which users get the send each day. Here's the schedule, the Random Bucket Number trick, and the day-10 mistake that ruins most of them.
Found this useful? Share it with your team.
Use this in Claude
Run this methodology inside your Claude sessions.
Orbit turns every guide on this site into an executable Claude skill — 63 lifecycle methodologies, 91 MCP tools, native Braze integration. Free for everyone.