Core Skills

Customer Health Scoring

Building a health score that actually predicts retention and expansion — not just one that looks defensible in a board deck.

A is an attempt to compress the state an account into a single or color. Done well, it is the most useful operational instrument an team has — it triages attention, makes risk legible across the org, and turns retention from a reactive scramble into a managed portfolio. Done badly, it is theater: a green number on a dashboard while the account quietly slides toward .

The difference between a real and a vanity score is whether it has been validated against actual outcomes (renewal, expansion, ) and whether it actually changes behavior. If neither is true, it is decoration.

Defining health scoring systems

A is a weighted composite inputs that together estimate the probability of the account renewing and expanding. The output is typically a single (0–100), a color (red/yellow/green), or both.

Two design philosophies exist:

Predictive scores — built from historical data on what actually predicted and expansion in your customer base. Higher accuracy, harder to build, requires meaningful data on past renewals and churns.
Heuristic scores — built from informed judgment about what *should* predict outcomes (, engagement, etc.), with weights set by experienced CSMs and AMs. Easier to build, faster to deploy, but should be migrated toward predictive over time.

Most organizations start heuristic and evolve toward predictive as they accumulate enough renewal/ data to validate the weights. The mistake is treating a heuristic score as if it were predictive — assuming the green color means low churn risk when it has never actually been validated against churn.

Common inputs

A robust draws from multiple categories so that no single signal can dominate. Typical inputs:

Product usage — DAU/MAU, license utilization, depth feature , breadth of teams using the product.
Engagement — meeting with the buying team, response times, attendance, engagement.
Support activity — ticket volume, severity, time to resolution, escalations, or trends.
strength — confirmed and active, engaged, multi-threaded coverage the buying team, no single-point-of-failure relationships.
Commercial signals — payment timeliness, contract maturity, expansion activity, willingness to be referenced.
— the customer can articulate , has measured outcomes against the original , has shared results internally.

The trap: weighting too heavily toward easy-to-measure signals (usage, tickets) and underweighting the qualitative signals that actually predict ( strength, , executive engagement). The hardest signals to measure are often the most predictive.

Quantitative vs qualitative signals

A score built only from quantitative data is structurally blind. Two examples:

An account with 95% license utilization, low ticket volume, and on-time payments looks green. But the original left three months ago, the new VP has never met your team, and there is no relationship. The quantitative model misses all this. The account churns at renewal with 60 days notice and the scoring system records 'unexpected' loss.
An account with 40% license utilization and several open escalations looks red. But the customer is in the middle a planned migration, the is highly engaged, and a 3x expansion is being negotiated. The quantitative model triggers a intervention that confuses the customer and damages the relationship.

The operational answer is to overlay a qualitative scorecard maintained by the / on top the quantitative score. Senior orgs require the AM to weigh in monthly with three qualitative inputs: status (active/at risk/missing), engagement (active/lapsed/none), and outcome narrative (defensible/in development/missing). When the quantitative and qualitative diverge, the qualitative usually wins — and the divergence itself is a signal worth investigating.

Building a repeatable scoring model

A practical build :

Pick 5–8 inputs maximum. More inputs feel rigorous; in practice they dilute the signal and make the score impossible to debug. Five well-chosen inputs outperform fifteen weak ones.
Weight by predictive value, not by what is easy to measure. If strength predicts renewal more than license utilization, it should get a higher weight even though it is harder to capture.
Define thresholds for color bands. Green/yellow/red is more actionable than a continuous score. The bands should to specific intervention cadences (see next section).
Validate quarterly. Pull the last 90 days renewals, downsells, and churns. Did the score correctly classify them 60 and 90 days out? If not, adjust the weights and inputs.
Hold the model accountable. A that does not improve in accuracy over time is being maintained as theater. Either fix it or decommission it.
Keep the inputs visible to the . A black-box score that the cannot explain to their manager destroys trust in the system. Every score should be decomposable into the underlying inputs.

Using health scores to prioritize actions

The score is an input to action, not an output analysis. Tie each color band to a specific operating :

Green — monthly check-in, expansion scan quarterly, light-touch . The risk is under-investing time on accounts that are quietly on a slow decline; audit green accounts that have been green for 4+ quarters for hidden complacency.
Yellow — biweekly review, formal action plan with named owner and date, trigger if no improvement in 60 days. Yellow is the most operationally important band — it is where intervention is still cheap and the outcome is genuinely changeable.
Red — weekly war-room with , , , ; written recovery plan reviewed at the leadership level; explicit decision at 90 days whether to invest further or accept the loss. Red accounts that linger as red for 6+ months without resolution are usually evidence organizational unwillingness to make the hard call, not of recoverable risk.

The portfolio view matters as much as the individual account view. If your portfolio shifts 10% from green to yellow in a quarter, that is a structural signal — about a product issue, a market shift, or a coverage problem — that deserves leadership attention regardless which specific accounts moved.

Real-world example

A SaaS company built a with 14 inputs covering usage, support, , and commercial activity. After two years it correlated weakly with actual (~52% accuracy at 90 days out — barely better than chance). A new simplified the model to six inputs and added two qualitative -maintained signals ( status and outcome narrative). After the redesign, accuracy at 90 days improved to 81%. More importantly, the operational behavior changed: yellow accounts now generated specific action plans within seven days being flagged, and the proportion of red accounts that recovered to green within 90 days rose from 18% to 47%. improved from 102% to 119% over the following four quarters, with the bulk of the improvement attributed to earlier intervention on accounts the prior model had been classifying as green until the renewal quarter.