Как посчитать Churn Risk Score в SQL
Содержание:
Зачем Churn Risk Score
Churn Risk Score = вероятность user churn в next N days. Action — retention campaigns target high-risk users до того, как ушёл. ML model или rule-based.
Rule-based scoring
Без ML, через SQL rules:
WITH user_signals AS (
SELECT
u.user_id,
CURRENT_DATE - COALESCE(MAX(a.DATE), u.created_at::DATE) AS days_inactive,
COUNT(a.DATE) FILTER (WHERE a.DATE >= CURRENT_DATE - INTERVAL '7 days') AS active_last_7d,
COUNT(a.DATE) FILTER (WHERE a.DATE >= CURRENT_DATE - INTERVAL '30 days') AS active_last_30d,
-- Decline indicator
COUNT(a.DATE) FILTER (WHERE a.DATE >= CURRENT_DATE - INTERVAL '7 days')::NUMERIC
/ NULLIF(COUNT(a.DATE) FILTER (WHERE a.DATE BETWEEN CURRENT_DATE - INTERVAL '14 days' AND CURRENT_DATE - INTERVAL '8 days'), 0) AS week_over_week_ratio,
EXISTS (SELECT 1 FROM support_tickets WHERE user_id = u.user_id AND created_at >= CURRENT_DATE - INTERVAL '30 days' AND priority = 'high') AS recent_complaint
FROM users u
LEFT JOIN activity a USING (user_id)
WHERE u.active = TRUE
GROUP BY u.user_id, u.created_at
)
SELECT
user_id,
days_inactive,
active_last_7d,
week_over_week_ratio,
-- Score (additive)
(CASE WHEN days_inactive > 14 THEN 30 ELSE 0 END) +
(CASE WHEN days_inactive > 30 THEN 30 ELSE 0 END) +
(CASE WHEN week_over_week_ratio < 0.5 THEN 20 ELSE 0 END) +
(CASE WHEN active_last_7d = 0 THEN 30 ELSE 0 END) +
(CASE WHEN recent_complaint THEN 20 ELSE 0 END) AS churn_risk_score
FROM user_signals
ORDER BY churn_risk_score DESC;Score 0-100. > 70 = high risk.
Композитный score
WITH user_score AS (
SELECT
user_id,
days_inactive,
engagement_trend,
complaint_score,
usage_decline,
-- Composite (weighted)
days_inactive * 0.4
+ engagement_trend * 0.3
+ complaint_score * 0.2
+ usage_decline * 0.1 AS raw_score
FROM user_signals
)
SELECT
user_id,
raw_score,
NTILE(10) OVER (ORDER BY raw_score) AS risk_decile,
CASE
WHEN NTILE(10) OVER (ORDER BY raw_score) >= 9 THEN 'CRITICAL'
WHEN NTILE(10) OVER (ORDER BY raw_score) >= 7 THEN 'HIGH'
WHEN NTILE(10) OVER (ORDER BY raw_score) >= 4 THEN 'MEDIUM'
ELSE 'LOW'
END AS risk_segment
FROM user_score;Калибровка
Verify rules predict actual churn:
WITH historical_scores AS (
SELECT user_id, churn_risk_score AS score_30d_ago
FROM user_score_snapshots
WHERE snapshot_date = CURRENT_DATE - INTERVAL '30 days'
),
churn_outcomes AS (
SELECT user_id, churned_at IS NOT NULL AS churned
FROM users
WHERE created_at < CURRENT_DATE - INTERVAL '30 days'
)
SELECT
NTILE(10) OVER (ORDER BY score_30d_ago) AS risk_decile,
COUNT(*) AS users,
SUM(CASE WHEN churned THEN 1 ELSE 0 END) AS churned,
SUM(CASE WHEN churned THEN 1 ELSE 0 END)::NUMERIC * 100 / COUNT(*) AS churn_rate_pct
FROM historical_scores
JOIN churn_outcomes USING (user_id)
GROUP BY 1
ORDER BY 1;Top decile должен иметь significantly higher churn rate чем bottom.
Targeting
High-risk → retention email:
SELECT
user_id,
email,
churn_risk_score,
days_inactive
FROM user_scores
WHERE churn_risk_score >= 70
AND email_opt_in = TRUE
ORDER BY churn_risk_score DESC
LIMIT 1000;Частые ошибки
Ошибка 1. Score too sensitive. Everyone «high risk» — useless. Calibrate thresholds.
Ошибка 2. Static rules. World changes. Re-calibrate quarterly.
Ошибка 3. Score без validation. Verify high-score actually churns. Else just guess.
Ошибка 4. Single signal. Days inactive alone weak. Composite multiple signals.
Ошибка 5. Survivor bias. Train on completed cohorts only. Right-censored handling needed.
Связанные темы
- Как посчитать churn в SQL
- Как посчитать days since last login в SQL
- Как посчитать win-back rate в SQL
- Churn prediction
FAQ
Rule-based vs ML?
ML — higher accuracy, complex maintenance. Rules — interpretable, faster start.
Какой threshold high-risk?
Calibrate: top decile (or top 20%) — high risk typically.
Re-score frequency?
Daily для high-velocity. Weekly для stable products.
Signals что использовать?
Days inactive, engagement decline, support tickets, plan downgrades, feature usage drop.
Score → action?
Critical → CSM call. High → discount / promo email. Medium → re-engagement automated.