Updated · 9 min read
AI personalisation at scale: the architecture that actually works
Picture the meeting. The vendor demo finishes, the AI feature is on the roadmap, somebody on the team flips the switch a fortnight later, and three months on the dashboard hasn't really moved. Braze ships BrazeAI (the rebrand of what was Sage AI). Iterable ships AI Optimization. Klaviyo ships AI everything. Salesforce ships Einstein. The pitch is identical: turn the AI on, watch the numbers go up. The reality is that AI personalisation only produces lift when the architecture underneath it is healthy — clean events, real-time profiles, modular content, and a measurement layer that catches when the model is wrong. This is the architecture brief, not the vendor brochure.

By Justin Williames
Founder, Orbit · 10+ years in lifecycle marketing
First, what "AI personalisation" actually means in this room
AI personalisation is a stack, not a feature. The model is the visible 10%. The data, content, and activation layers underneath are the 90% that decides whether the model has anything useful to say.
Quick scene-setting before the architecture, because "AI personalisation" is one phrase doing three different jobs. When a vendor says it, they could mean any of the three below — and the failure modes, the cost, and the lift profile are different for each.
An ESP — email service provider, the platform that actually sends your messages, like Braze, Iterable, Klaviyo, HubSpot — typically packages all three under one "AI" label. Useful to separate them.
Predictive personalisation — models that score each user with a number (churn risk, propensity to buy, predicted lifetime value) and route the message accordingly. Braze Predictive Suite, Iterable Brain, Klaviyo Predictive Analytics, Salesforce Einstein. The output is a number per user that drives a decision ("send the win-back to anyone with churn risk above 0.7").
Generative personalisation — models (LLMs, the same family of large language models behind ChatGPT and Claude) that produce content per user or per cohort: subject lines, body copy, product blurbs. BrazeAI (formerly Sage AI), Iterable Copy Assist, Klaviyo Subject Line generator. The output is the actual words the user reads in the inbox.
Optimisation personalisation — models that pick from a candidate set the marketer pre-defined: best send time, best variant, best frequency. Send-time optimisation (STO), multi-armed bandits (an algorithm that allocates more traffic to whichever variant is winning, in real time), next-best-action engines. The output is a decision among options you already wrote.
Different stacks, different failure modes, different return-on-investment profiles. A program that says "we're using AI personalisation" without naming which of the three is using marketing language to obscure an architectural question. Job number one of any AI personalisation effort is naming which capability is being deployed and why.
The four-layer stack — and why the model is the boring bit
Most lifecycle teams want to argue about which model is best. The model is rarely the problem. Every AI personalisation deployment that produces measurable lift sits on top of four layers underneath the model. Skip one and the clever bit on top is decoration.
Picture it as a building: the model is the penthouse, but if the foundations are cracked the whole thing leans.
Layer 1 — Event capture. Real-time, well-named, complete events — the records of what users did and when, like added_to_cart or completed_signup — flowing from your product into the ESP. Without this the model trains on a partial picture and fires recommendations on stale signal. The most common cause of "our AI doesn't work" is that events lag by hours, fire inconsistently across web and mobile, or carry names like click_button_2 that mean nothing to anyone, model included.
Layer 2 — Unified user profile. A single, real-time view of the user that combines behavioural events, custom attributes (the stored facts about a user — plan tier, sign-up date, country), subscription state, and product catalog interactions. Braze ships this natively; Iterable, Klaviyo, and HubSpot ship variations. If your profile is fragmented across systems and the AI only sees one slice, expect performance bounded by what that slice contains.
Layer 3 — Modular content. Content Blocks (Braze's reusable, swappable email components), dynamic blocks, catalog items — anything the model can swap into a message without a human re-authoring the email. A program with one master HTML file per send and no modular structure has nowhere for AI personalisation to write its output. The model produces a recommendation; there's no slot to drop it into. This is where most programs stall — not in the model, in the place where the model's output is supposed to land.
Layer 4 — Activation logic. The runtime that picks which version of which block goes to which user. Braze does this through Connected Content (a feature that pulls live data from outside Braze into the email at send time) plus Catalogs and Liquid (the templating language most ESPs use to inject personalised values into messages — same family as the Liquid in Shopify). Iterable does it through Catalog Lookup, Klaviyo through dynamic content blocks. The model's output is meaningless without an activation layer that can route it to the right user at the right moment.
What this looks like in Braze (and the same shape in everyone else)
Braze packages the four layers into named features. Useful to map them, since most operators inherit a Braze instance with some of these turned on without anyone documenting why. If you're on a different ESP the names change and the shape doesn't — Iterable, Klaviyo, HubSpot and Salesforce all have the same four layers under different brand labels. The translation table is at the bottom of this section.
BrazeAI. The generative layer — subject lines, copy variations, image suggestions inside the Braze composer. Rebranded from Sage AI; same feature set, new umbrella name. It operates on the message in front of you, not the broader program. Useful for accelerating production, less useful for personalisation in the strict sense — it generates copy a marketer reviews, not per-user output that varies in real time.
Predictive Suite. Predictive layer — churn risk, conversion propensity, lifetime value (LTV) prediction. Output: a score per user that becomes filterable in segment logic. Quality varies dramatically with data volume — a program with 10,000 monthly active users (MAU) and a thin event history gets a worse model than one with 10 million MAU and three years of clean events. The model is the same. The input is the bottleneck.
Connected Content. Activation layer — pulls real-time data from external APIs into the message at send time. Think of it as a placeholder in the email that, the moment the email is being rendered for a specific user, calls out to a service of your choosing and asks "what should this person see?". It's the bridge between AI output produced elsewhere (your warehouse, an internal recommendation service, a Vertex AI endpoint, an OpenAI call) and the rendered message. The most underused Braze feature in most programs and the most powerful when you outgrow Predictive Suite's built-ins.
Catalogs. Content layer — the structured product, content, or asset library that AI personalisation references. Without a properly structured Catalog, recommendations are limited to what fits in custom attributes (a handful of fields per user). Catalog quality — taxonomy, completeness, freshness — directly bounds recommendation quality. Junk in, junk recommendations out.
Liquid. Activation layer (the runtime). Every dynamic decision in a Braze message resolves through Liquid syntax. The Liquid reference covers the syntax. The activation layer is where bad architecture becomes visible to the user — a Liquid expression that breaks ships {{ ${first_name} }} to 50,000 people regardless of how clever the AI was upstream. Every lifecycle lead with five years on the tools has shipped one of those at least once.
The same architecture maps onto Iterable (Brain + Catalogs + Catalog Lookup), Klaviyo (Predictive Analytics + dynamic blocks + product feeds), HubSpot (Smart Content + Personalization Tokens), and Salesforce Marketing Cloud (Einstein + Personalization Builder). Different feature names, identical layers. If you can find the four layers, you can run this stack.
The three architectural choices that decide whether the lift shows up
Once the four layers are healthy, three decisions decide whether AI personalisation moves revenue or just produces a nice-looking dashboard.
Build vs buy the model. The AI features your ESP ships out of the box are good enough for activation, churn risk, and basic recommendations at small-to-mid scale. They stop being good enough when the use case is specific to your domain — a fashion marketplace needs style-similarity recommendations the ESP doesn't ship; a B2B SaaS needs intent scoring tied to product usage patterns; a marketplace needs supply-side and demand-side signals stitched together. The build-vs-buy call should be made on use-case specificity, not on which option sounds more sophisticated. The AI Personalisation skill covers the decision framework.
Real-time vs batch activation. Real-time scoring (Connected Content calling a recommendation API at the moment the email is being assembled for a user) opens the door to fresh signal but adds latency and a runtime dependency — if the API is slow or down, your send is slow or down. Batch activation (recommendations precomputed in your warehouse and synced to user attributes nightly) is more reliable but always 24 hours stale. Most programs over-rotate to real-time when batch would do, then under-invest in the latency monitoring that makes real-time safe. Pick on how fresh the signal needs to be. A product recommendation can be a day old. A churn-save trigger probably can't.
How much human review the model output gets. Generative AI output that ships unreviewed will eventually produce something off-brand, factually wrong, or quietly offensive. Generative AI output that gets human-reviewed before every send is too slow to scale. The middle path: the model produces variants, a human approves the first batch per template, then the system rotates approved variants automatically. Klaviyo and Iterable have built variants of this pattern. BrazeAI is more interactive — per-message generation in the composer — and works better in a pre-send review flow than a fully automated one.
What an honest rollout actually looks like (not the vendor case study)
The temptation is to flip every switch on day one and see what happens. The pattern that works is the opposite: one capability, one program, one measurement window, expand only after the previous expansion produced a measurable, holdout-validated lift.
A defensible rollout sequence:
1. Pick one program where the data is clean and the activation layer already exists. Usually onboarding, abandoned cart, or post-purchase. Programs with messy data or no modular content are fixable but should be fixed before AI gets layered on.
2. Pick one capability. Predictive scoring on a single segment, or AI subject lines on a single template, or a single product recommendation slot. Not five at once. The goal is to learn whether this lever moves this metric for this audience.
3. Run for at least 30 days against a holdout. A holdout is the random group you deliberately don't expose to the new thing — your control, the way a clinical trial works. 10–20% of the audience receives the non-AI version. The holdout group guide covers the design. Anything shorter is noise; anything without a holdout is vendor marketing.
4. Read out against the metric the program exists for. Typically downstream conversion or revenue, not opens or clicks. Apple Mail Privacy Protection (MPP — the iOS feature that pre-fetches images, including the tracking pixel that records opens, before the user has even looked at the email) makes the open rate unreliable; AI personalisation often inflates opens without moving revenue. The measurement guide covers the honest readout.
5. If the holdout test produces real lift, expand. Second program or second capability. If it doesn't, don't expand. Diagnose why — usually it's back at the data or activation layer rather than the model itself.
The teams that get the most out of AI personalisation treat it as a series of small bets validated individually, not a platform decision validated by the vendor. The teams that get the least flip everything on, watch dashboards trend up due to MPP and seasonality, and write a case study the data won't survive an audit of.
One thing to do Monday: name which of the three capabilities (predictive, generative, optimisation) you actually want, name the program you'll run it on, and confirm that program has a holdout. If any of those three is missing, that's the work — not the model.
Read to the end
Scroll to the bottom of the guide — we'll tick it on your reading path automatically.
Frequently asked questions
- Do I need a CDP before deploying AI personalisation?
- Not necessarily. A unified user profile is the requirement; a customer data platform (CDP — a separate system that consolidates user data from many sources) is one way to build it but not the only way. Braze, Iterable, and Klaviyo can serve as the unified profile layer for many lifecycle use cases without a separate CDP. The decision turns on whether the personalisation needs to span multiple downstream tools (web, app, ads, support) — if yes, a CDP starts paying for itself. If the AI personalisation lives entirely inside the ESP, a CDP is optional infrastructure.
- How much data do I need for predictive models to work?
- ESP-built predictive models typically need at least 10,000 users with 90+ days of behaviour to produce stable scores. Below that, the model falls back to category averages or refuses to score. For custom-built models on specific use cases (churn, propensity, recommendations), 50,000+ users with rich event histories is a more realistic floor. Programs below those thresholds can still benefit from optimisation features (send-time optimisation, multi-armed bandit testing) which need less data per user.
- Should I use BrazeAI for subject line generation?
- BrazeAI (formerly Sage AI) works best as an acceleration tool for marketers who already know their voice — it produces variants you select and refine, not finished copy you ship blind. The lift comes from generating five candidates in 30 seconds instead of 10 minutes, which lets you test more often. The pattern that fails: shipping BrazeAI output unreviewed. Brand voice drift accumulates and the program ends up sounding like every other program using the same model.
- How do I know if AI personalisation is actually working?
- A holdout group that receives the non-AI version of the same program. Compare downstream conversion or revenue (not opens, which are corrupted by Apple Mail Privacy Protection) over 30+ days. If the AI version doesn't produce a statistically meaningful lift against the holdout, the AI isn't earning its place in the program. Vendor case studies aren't validation; they're marketing.
- What's the biggest mistake teams make with AI personalisation?
- Treating it as a feature flip rather than an architectural commitment. Turning on Predictive Suite or BrazeAI without first auditing event quality, profile unification, content modularity, and activation logic produces a model that runs on broken inputs. The output looks like personalisation; the lift never materialises; the team blames the model. The model wasn't the problem.
This guide is backed by an Orbit skill
Related guides
Browse allPredictive models in lifecycle: churn, propensity, and recommendations without the magic
Predictive models in lifecycle are mostly three things: churn risk, conversion propensity, and product recommendations. Each one earns or loses its place based on whether its score actually changes a decision. Here's the operator view of what's worth deploying, what to expect from ESP-native suites, and when to build your own.
Building a personal chief-of-staff AI on Claude Routines
A real chief of staff used to mean a salary line on an exec's budget. Anthropic's Routines feature — Claude running on a schedule with access to your work tools — pulls the job inside reach of one operator. This is the architecture: morning brief, hourly interactive layer, midday drift check, evening debrief with end-of-day reconciliation, Sunday weekly review. Plus the draft-react protocol that lets the assistant act without auto-sending, calendar work blocks that double as the task tracker, and memory files the system writes into. Brain in a GitHub repo, runtime in claude.ai, no servers.
Segmentation strategy: beyond RFM
RFM is the floor of audience segmentation, not the ceiling. Every program that stops there ends up describing what users already did without ever predicting what they'll do next. Here's the segmentation stack that actually drives lifecycle decisions — and how to build it in Braze without ending up with 400 segments nobody understands.
Lifecycle marketing for flat products
The standard lifecycle playbook assumes weekly engagement and tidy stage progression. Most real products aren't shaped like that. This is how to design lifecycle — the messaging program that nudges users through their relationship with a product — for things people use once a year, once a quarter, or whenever they happen to need you. The textbook quietly makes those programs worse.
Generative AI for lifecycle content: where it earns its place and where it embarrasses you
Generative AI inside lifecycle ESPs has moved from novelty to default in 18 months. BrazeAI (formerly Sage AI), Iterable Copy Assist, Klaviyo's subject line generator — they all promise per-message copy at scale. Some uses are genuinely useful. Others are a fast path to brand drift, factual errors, and reputational damage. Here's the line.
What is lifecycle marketing? A field guide for operators starting from zero
If you're new to CRM and lifecycle, the field reads like a pile of acronyms and vendor demos. It's actually one simple idea executed across five canonical programs. Here's the frame that makes the rest of the library make sense.
Found this useful? Share it with your team.
Use this in Claude
Run this methodology inside your Claude sessions.
Orbit turns every guide on this site into an executable Claude skill — 63 lifecycle methodologies, 91 MCP tools, native Braze integration. Free for everyone.