Should I use Braze Predictive Suite or build my own model?

Use Predictive Suite for generic churn and propensity at standard scale. Build your own when the use case is domain-specific (style similarity, intent scoring tied to product features), the activation surface spans multiple systems, or the audit / explainability requirements demand transparency. Most programs end up with a hybrid: ESP-native for the simple cases, custom models in the warehouse for the cases where domain specificity matters.

How long do predictive models take to deploy?

ESP-native: hours to a couple of weeks, depending on event hygiene. Custom warehouse-trained models: 2–6 months including data engineering, model build, validation, and activation. The unsexy truth: most of the timeline is data work, not model work. Programs that say 'we built it in two weeks' usually built it on top of a data layer someone else built before them.

Can predictive models work for small audiences?

Below 10K active users with at least 90 days of clean events, ESP-native predictive features fall back to weak defaults and don't earn their place. Below 50K, custom models tend to overfit — meaning they learn the quirks of your specific historical data rather than patterns that generalise to new users. For small audiences, simple heuristic segmentation (recency, frequency, tier) usually outperforms the predictive layer. The right time to deploy is when audience scale and event richness justify the complexity.

What's the difference between propensity and intent scoring?

Propensity scores predict the probability of an action over a defined future window — 'probability of purchase in next 30 days'. Intent scoring is typically near-real-time, surfacing users showing high signal right now (currently browsing, currently engaging). Propensity is for medium-horizon segmentation; intent is for fast-trigger programs. They use overlapping data but answer different questions and feed different programs.

How often should predictive models be retrained?

ESP-native models retrain on a schedule the vendor controls — typically weekly. Custom models depend on the use case: ecommerce recommendations daily or weekly, churn models monthly, LTV models quarterly. The signal is drift — when the model's predictions stop matching observed outcomes, retrain. Build drift monitoring as part of the deployment, not as a future project.

Advanced

Updated 27 April 2026 · 9 min read

Predictive models in lifecycle: churn, propensity, and recommendations without the magic

Name: Orbit
Availability: InStock
Author: Justin Williames

Picture a weather forecaster who tells you, every morning, that there's a 73% chance of rain. Useful — if you change your behaviour. You bring a coat, you reschedule the picnic, you delay the paint job. Useless — if you wear the same outfit and walk the same route regardless. Predictive models in lifecycle marketing are weather forecasts for your users. The chart goes up and to the right. The score is real. The harder question is whether anyone packs a different coat. This guide covers the three predictions that actually move lifecycle outcomes — churn, propensity, recommendations — and the operator-level questions that decide whether each one earns its place in your stack.

By Justin Williames

Founder, Orbit · 10+ years in lifecycle marketing

SharePost Post

If you're new here: what a predictive model actually is

A predictive model — a piece of software that learns patterns from past behaviour and outputs a probability for a future event — is, in lifecycle marketing, almost always one of three flavours. It looks at what your users have done, finds the patterns that preceded the outcome you care about, and gives every user a score for how likely they are to do that thing next.

The doctor analogy lands cleanest. A GP who's seen ten thousand patients can tell you, after a five-minute consult, that the bloke in front of her has a high chance of a cardiac event in the next year — not because she has a crystal ball but because she's pattern-matched against the ten thousand. A predictive model does the same trick across millions of users, on behavioural data rather than blood pressure. Same logic. More users. No bedside manner.

The three flavours, before we go any further:

Churn risk — probability the user will lapse (stop using the product) in a defined window. Propensity — probability the user will do the thing you want (purchase, upgrade, sign up) in a defined window. Recommendations — a ranked list of products or articles per user, ordered by predicted relevance.

Each one outputs a number. The job is deciding whether the number changes anything you do.

The score earns its place when it changes a decision

A predictive score earns its place when it changes a decision. A score that ranks users you would have segmented the same way without it is decoration — interesting to look at, expensive to maintain, and quietly trained on data the team should be using directly.

That's the test. Strip away the dashboards, the model accuracy charts, the AUC numbers (AUC — area under the curve, a single number from 0.5 to 1.0 measuring how well a model separates the two outcomes). The only question that matters: does this score cause your program to do something different? If the answer is no, you've bought a thermometer for a room you're not heating.

Through that lens, here are the three use cases that actually pay for themselves.

Churn / risk of inactivity. Output: probability the user will lapse in the next 30 / 60 / 90 days. Used to trigger save flows — messages designed to pull a user back before they go quiet — instead of win-back campaigns sent after they've already gone. Real ROI depends on whether your save flow actually saves anyone. A churn model feeding a save flow that doesn't work is just an expensive way to identify the problem you can't solve. Test the save flow first. Deploy the model only when the flow has proven incremental lift — meaning users who got it converted at a higher rate than a holdout (the random users you deliberately don't message, your control group) who didn't.

Conversion propensity. Output: probability the user will convert (purchase, upgrade, sign up) in a defined window. Used to suppress low-propensity users from heavy promotional pushes — saving deliverability (your reputation with mailbox providers) and user goodwill — or to time high-intent triggered programs. Most valuable for SaaS trial-to-paid programs and ecommerce cart abandonment, where the audience is heterogeneous and the cost of a wasted send is real.

Product / content recommendations. Output: a ranked list of items per user. Used in onboarding, post-purchase, browse abandonment, and newsletter content selection. The use case where AI personalisation pays for itself most clearly — once the catalog is large enough that hand-curation breaks down. Below ~200 SKUs (stock-keeping units, the unique products in your catalog) or content items, hand-curation by lifecycle stage usually outperforms.

Other predictive features exist — predicted LTV (lifetime value, the total revenue a user is forecast to generate), churn cause attribution, optimal discount level — but the three above carry most of the measurable revenue lift in lifecycle programs. Start there. Expand once each is proven in your context.

What your ESP already ships — Braze, Iterable, Klaviyo, Salesforce

Your ESP (email service provider — the platform that sends your messages and stores your user data) probably ships predictive features already. Worth knowing what you're actually getting before you pay for anything bespoke.

Braze Predictive Suite. Out-of-the-box churn risk and conversion propensity scored against custom-defined events — you tell Braze what counts as "converted" or "churned" for your business and it scores users against that. Configurable target event and lookback window. Output is a per-user score available in segment filters. Strengths: zero build effort, integrates natively with Canvas (Braze's journey builder). Weaknesses: limited transparency into how the model decided what it decided, dependent on Braze having complete behavioural data on the user, scoring quality plateaus with audience size.

Iterable Brain. Send-time optimisation, frequency optimisation, channel selection. Less focused on score-based segmentation, more on per-message decisions — which channel, which time, how often. Best for programs running multi-channel and needing per-user channel selection.

Klaviyo Predictive Analytics. Predicted next-purchase date, predicted CLV (customer lifetime value), churn risk. Strong for ecommerce because the model is purpose-built for transactional data. Available as filterable user properties, like Braze. Strength: tight integration with Klaviyo's ecommerce-native data model. Weakness: harder to extend to non-ecommerce use cases.

Salesforce Einstein. Most extensive feature set — engagement scoring, send-time, content selection, journey decisions — but also the most setup, the steepest learning curve, and model quality meaningfully tied to how complete and clean your SFMC data extensions (Salesforce's term for the database tables it stores user data in) are.

When the ESP-native version is genuinely enough

The realistic answer: most of the time, for most programs, for most use cases. ESP-native predictive features are good enough when:

Generic use case — "score users by likelihood to churn in 30 days" rather than "score users by likelihood to churn for reason X given product feature usage pattern Y."

Large enough audience — typically 50K+ active users with 6+ months of clean event data (the stream of user actions the model trains on). Below that, ESP models fall back to weak defaults regardless of vendor. Models need examples to learn from; small audiences don't produce enough.

Activation stays inside the ESP — segment filters, Canvas branching, Liquid conditions (Liquid is the templating language most ESPs use to inject personalised values). Pulling the score out of the ESP for use elsewhere is where ESP-native starts to feel constraining.

No internal ML team to maintain a custom model. Hidden cost of custom is the ongoing retraining, drift monitoring, and on-call when the recommendation API breaks at 9pm on a Friday — the build itself is the easy part.

When the off-the-shelf version stops being enough

Custom models earn their cost in three patterns:

Domain-specific signal. Style-similarity recommendations for a fashion marketplace. Intent scoring tied to specific product feature usage for a B2B SaaS. Personalised content rankings for a publisher where the "catalog" is articles. The signals (the inputs the model learns from — clicks, purchases, dwell time, feature usage) that produce useful predictions are too domain-specific for a generic ESP model to learn from a generic event stream.

Multi-system activation. The score is needed in the ESP, on the website, in the app, in paid retargeting audiences, and in support tools. ESP-native scores can be exported but the architecture quickly tips toward "build the model in the warehouse, sync the score everywhere" when the activation surface is broad.

Audit and explainability. Regulated industries — finance, health, insurance — often need to explain why a specific user got a specific score. ESP-native black-box scores fail this test. Custom models with documented features (the named inputs the model uses) and feature importance (a ranking of which inputs mattered most for each prediction) pass it.

The defensible build pattern: model trained in the warehouse (BigQuery, Snowflake, Databricks — the cloud databases where your raw event data lives), output written nightly to user attributes via reverse-ETL (Census, Hightouch — tools that push warehouse data back into operational systems like your ESP), and consumed via standard segment filters. The ESP doesn't need to know it's a predictive model — it just sees a user attribute it can filter on.

Real-time variations of the same pattern use Connected Content (Braze) or Catalog Lookup (Iterable) — features that let your ESP call an external system at the moment a message is being assembled — to hit a model endpoint at send time. Adds latency and a runtime dependency, but keeps recommendations fresh. The architecture guide covers the trade-off.

The five questions to answer before turning anything on

Before deploying any predictive feature — built or bought — answer these. Most of the time at least one of the answers reveals that the model isn't the bottleneck.

1. What decision does the score change? If users in the top quintile (top 20% by score) get a different message, journey, or send than users in the bottom quintile, the score is doing work. If everyone gets the same treatment regardless, you're paying for a dashboard.

2. What does the score replace? The honest comparison isn't "model vs nothing" — it's "model vs the heuristic the team would use otherwise." A heuristic is a simple rule of thumb: "active in last 30 days = yes/no." A churn model often goes head-to-head with that exact heuristic. If the model can't beat it in a holdout test (where you score users with both methods and compare which one predicts the actual outcome better), it's expensive complexity.

3. How will it be measured? A predictive model needs a holdout. Some users get the predicted-action treatment; some get the previous heuristic-based treatment. Compare downstream metrics over 60+ days. Anything shorter is noise. Anything without a holdout is faith.

4. What happens when it's wrong? Models drift — their predictions get worse over time as the world changes underneath them. Data pipelines break. The score that was 0.92 last week is 0.31 this week, not because the user changed but because an upstream event stopped firing and the model lost half its inputs. Build the monitoring before the deployment, not after the first incident.

5. Who owns it? Predictive models without a clear owner go stale within months. The owner is responsible for retraining cadence (how often the model relearns from fresh data), drift monitoring, and the question "is this still earning its place?" Without an owner, the score becomes load-bearing infrastructure that nobody tests, updates, or trusts.

The AI Personalisation skill covers the rollout framework end-to-end, including the build vs buy decision matrix and the holdout design for each of the three primary use cases.

The patterns that look impressive and quietly do nothing

Predictive scores nobody filters on. A common pattern: turn on Predictive Suite, watch the scores populate, never actually use them in segment logic — because the existing program already segments on recency and tier and that's what the team trusts. The model trains. The dashboards trend. No decision changes. Either commit to using the score in segmentation or don't turn it on.

Predictive features sold as content selection. "The AI picks the best subject line per user." Most ESP implementations are picking from a small candidate set the marketer pre-defined, weighted by historical engagement on similar users. Useful, but closer to a multi-armed bandit (a class of algorithm that explores options and shifts traffic toward the winners) than per-user content generation. Set expectations accordingly.

Recommendations that override hand-curation. For small catalogs (200 SKUs or fewer), hand-curated recommendations by lifecycle stage usually outperform model-generated. Programs that switch to AI recommendations and lose curation often see flat or negative impact. Recommendations are a tool for catalog scale, not a replacement for category strategy.

The pattern that doesn't disappoint: a predictive feature deployed to one program, validated against a holdout for at least 30 days against a non-vanity metric, expanded only once the lift is real. Slow, careful, and accumulates more revenue than the "flip every switch" rollout.

Read to the end

Scroll to the bottom of the guide — we'll tick it on your reading path automatically.

Frequently asked questions

Should I use Braze Predictive Suite or build my own model?: Use Predictive Suite for generic churn and propensity at standard scale. Build your own when the use case is domain-specific (style similarity, intent scoring tied to product features), the activation surface spans multiple systems, or the audit / explainability requirements demand transparency. Most programs end up with a hybrid: ESP-native for the simple cases, custom models in the warehouse for the cases where domain specificity matters.
How long do predictive models take to deploy?: ESP-native: hours to a couple of weeks, depending on event hygiene. Custom warehouse-trained models: 2–6 months including data engineering, model build, validation, and activation. The unsexy truth: most of the timeline is data work, not model work. Programs that say 'we built it in two weeks' usually built it on top of a data layer someone else built before them.
Can predictive models work for small audiences?: Below 10K active users with at least 90 days of clean events, ESP-native predictive features fall back to weak defaults and don't earn their place. Below 50K, custom models tend to overfit — meaning they learn the quirks of your specific historical data rather than patterns that generalise to new users. For small audiences, simple heuristic segmentation (recency, frequency, tier) usually outperforms the predictive layer. The right time to deploy is when audience scale and event richness justify the complexity.
What's the difference between propensity and intent scoring?: Propensity scores predict the probability of an action over a defined future window — 'probability of purchase in next 30 days'. Intent scoring is typically near-real-time, surfacing users showing high signal right now (currently browsing, currently engaging). Propensity is for medium-horizon segmentation; intent is for fast-trigger programs. They use overlapping data but answer different questions and feed different programs.
How often should predictive models be retrained?: ESP-native models retrain on a schedule the vendor controls — typically weekly. Custom models depend on the use case: ecommerce recommendations daily or weekly, churn models monthly, LTV models quarterly. The signal is drift — when the model's predictions stop matching observed outcomes, retrain. Build drift monitoring as part of the deployment, not as a future project.

This guide is backed by an Orbit skill

Related guides

Browse all

Strategy9 min

AI personalisation at scale: the architecture that actually works

Every ESP now sells an AI personalisation layer. Most teams turn it on and quietly notice the lift is smaller than the sales deck promised. The model isn't the problem — the plumbing underneath is. Here's the data, content and activation stack that decides whether AI personalisation moves revenue or just moves dashboards.

Strategy10 min

Lifecycle marketing for flat products

The standard lifecycle playbook assumes weekly engagement and tidy stage progression. Most real products aren't shaped like that. This is how to design lifecycle — the messaging program that nudges users through their relationship with a product — for things people use once a year, once a quarter, or whenever they happen to need you. The textbook quietly makes those programs worse.

Strategy14 min

Building a personal chief-of-staff AI on Claude Routines

A real chief of staff used to mean a salary line on an exec's budget. Anthropic's Routines feature — Claude running on a schedule with access to your work tools — pulls the job inside reach of one operator. This is the architecture: morning brief, hourly interactive layer, midday drift check, evening debrief with end-of-day reconciliation, Sunday weekly review. Plus the draft-react protocol that lets the assistant act without auto-sending, calendar work blocks that double as the task tracker, and memory files the system writes into. Brain in a GitHub repo, runtime in claude.ai, no servers.

Craft8 min

Generative AI for lifecycle content: where it earns its place and where it embarrasses you

Generative AI inside lifecycle ESPs has moved from novelty to default in 18 months. BrazeAI (formerly Sage AI), Iterable Copy Assist, Klaviyo's subject line generator — they all promise per-message copy at scale. Some uses are genuinely useful. Others are a fast path to brand drift, factual errors, and reputational damage. Here's the line.

Strategy10 min

What is lifecycle marketing? A field guide for operators starting from zero

If you're new to CRM and lifecycle, the field reads like a pile of acronyms and vendor demos. It's actually one simple idea executed across five canonical programs. Here's the frame that makes the rest of the library make sense.

Strategy10 min

Segmentation strategy: beyond RFM

RFM is the floor of audience segmentation, not the ceiling. Every program that stops there ends up describing what users already did without ever predicting what they'll do next. Here's the segmentation stack that actually drives lifecycle decisions — and how to build it in Braze without ending up with 400 segments nobody understands.

Found this useful? Share it with your team.

SharePost Post

Use this in Claude

Run this methodology inside your Claude sessions.

Orbit turns every guide on this site into an executable Claude skill — 63 lifecycle methodologies, 91 MCP tools, native Braze integration. Free for everyone.

Advanced

Updated 27 April 2026 · 9 min read

Predictive models in lifecycle: churn, propensity, and recommendations without the magic

By Justin Williames

Founder, Orbit · 10+ years in lifecycle marketing

SharePost Post

If you're new here: what a predictive model actually is

The three flavours, before we go any further:

Each one outputs a number. The job is deciding whether the number changes anything you do.

The score earns its place when it changes a decision

A predictive score earns its place when it changes a decision. A score that ranks users you would have segmented the same way without it is decoration — interesting to look at, expensive to maintain, and quietly trained on data the team should be using directly.

Through that lens, here are the three use cases that actually pay for themselves.

What your ESP already ships — Braze, Iterable, Klaviyo, Salesforce

When the ESP-native version is genuinely enough

The realistic answer: most of the time, for most programs, for most use cases. ESP-native predictive features are good enough when:

Generic use case — "score users by likelihood to churn in 30 days" rather than "score users by likelihood to churn for reason X given product feature usage pattern Y."

When the off-the-shelf version stops being enough

Custom models earn their cost in three patterns:

The five questions to answer before turning anything on

Before deploying any predictive feature — built or bought — answer these. Most of the time at least one of the answers reveals that the model isn't the bottleneck.

The AI Personalisation skill covers the rollout framework end-to-end, including the build vs buy decision matrix and the holdout design for each of the three primary use cases.

The patterns that look impressive and quietly do nothing

Read to the end

Scroll to the bottom of the guide — we'll tick it on your reading path automatically.

Frequently asked questions

Should I use Braze Predictive Suite or build my own model?: Use Predictive Suite for generic churn and propensity at standard scale. Build your own when the use case is domain-specific (style similarity, intent scoring tied to product features), the activation surface spans multiple systems, or the audit / explainability requirements demand transparency. Most programs end up with a hybrid: ESP-native for the simple cases, custom models in the warehouse for the cases where domain specificity matters.
How long do predictive models take to deploy?: ESP-native: hours to a couple of weeks, depending on event hygiene. Custom warehouse-trained models: 2–6 months including data engineering, model build, validation, and activation. The unsexy truth: most of the timeline is data work, not model work. Programs that say 'we built it in two weeks' usually built it on top of a data layer someone else built before them.
Can predictive models work for small audiences?: Below 10K active users with at least 90 days of clean events, ESP-native predictive features fall back to weak defaults and don't earn their place. Below 50K, custom models tend to overfit — meaning they learn the quirks of your specific historical data rather than patterns that generalise to new users. For small audiences, simple heuristic segmentation (recency, frequency, tier) usually outperforms the predictive layer. The right time to deploy is when audience scale and event richness justify the complexity.
What's the difference between propensity and intent scoring?: Propensity scores predict the probability of an action over a defined future window — 'probability of purchase in next 30 days'. Intent scoring is typically near-real-time, surfacing users showing high signal right now (currently browsing, currently engaging). Propensity is for medium-horizon segmentation; intent is for fast-trigger programs. They use overlapping data but answer different questions and feed different programs.
How often should predictive models be retrained?: ESP-native models retrain on a schedule the vendor controls — typically weekly. Custom models depend on the use case: ecommerce recommendations daily or weekly, churn models monthly, LTV models quarterly. The signal is drift — when the model's predictions stop matching observed outcomes, retrain. Build drift monitoring as part of the deployment, not as a future project.

This guide is backed by an Orbit skill

Related guides

Browse all

Strategy9 min

Use this in Claude

Run this methodology inside your Claude sessions.

Orbit turns every guide on this site into an executable Claude skill — 63 lifecycle methodologies, 91 MCP tools, native Braze integration. Free for everyone.