Как посчитать Churn Risk Score в SQL

Закрепи формулу churn risk score в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать churn risk score в Telegram

Зачем Churn Risk Score

Churn Risk Score = вероятность user churn в next N days. Action — retention campaigns target high-risk users до того, как ушёл. ML model или rule-based.

Rule-based scoring

Без ML, через SQL rules:

WITH user_signals AS (
    SELECT
        u.user_id,
        CURRENT_DATE - COALESCE(MAX(a.DATE), u.created_at::DATE) AS days_inactive,
        COUNT(a.DATE) FILTER (WHERE a.DATE >= CURRENT_DATE - INTERVAL '7 days') AS active_last_7d,
        COUNT(a.DATE) FILTER (WHERE a.DATE >= CURRENT_DATE - INTERVAL '30 days') AS active_last_30d,
        -- Decline indicator
        COUNT(a.DATE) FILTER (WHERE a.DATE >= CURRENT_DATE - INTERVAL '7 days')::NUMERIC
        / NULLIF(COUNT(a.DATE) FILTER (WHERE a.DATE BETWEEN CURRENT_DATE - INTERVAL '14 days' AND CURRENT_DATE - INTERVAL '8 days'), 0) AS week_over_week_ratio,
        EXISTS (SELECT 1 FROM support_tickets WHERE user_id = u.user_id AND created_at >= CURRENT_DATE - INTERVAL '30 days' AND priority = 'high') AS recent_complaint
    FROM users u
    LEFT JOIN activity a USING (user_id)
    WHERE u.active = TRUE
    GROUP BY u.user_id, u.created_at
)
SELECT
    user_id,
    days_inactive,
    active_last_7d,
    week_over_week_ratio,
    -- Score (additive)
    (CASE WHEN days_inactive > 14 THEN 30 ELSE 0 END) +
    (CASE WHEN days_inactive > 30 THEN 30 ELSE 0 END) +
    (CASE WHEN week_over_week_ratio < 0.5 THEN 20 ELSE 0 END) +
    (CASE WHEN active_last_7d = 0 THEN 30 ELSE 0 END) +
    (CASE WHEN recent_complaint THEN 20 ELSE 0 END) AS churn_risk_score
FROM user_signals
ORDER BY churn_risk_score DESC;

Score 0-100. > 70 = high risk.

Композитный score

WITH user_score AS (
    SELECT
        user_id,
        days_inactive,
        engagement_trend,
        complaint_score,
        usage_decline,
        -- Composite (weighted)
        days_inactive * 0.4
        + engagement_trend * 0.3
        + complaint_score * 0.2
        + usage_decline * 0.1 AS raw_score
    FROM user_signals
)
SELECT
    user_id,
    raw_score,
    NTILE(10) OVER (ORDER BY raw_score) AS risk_decile,
    CASE
        WHEN NTILE(10) OVER (ORDER BY raw_score) >= 9 THEN 'CRITICAL'
        WHEN NTILE(10) OVER (ORDER BY raw_score) >= 7 THEN 'HIGH'
        WHEN NTILE(10) OVER (ORDER BY raw_score) >= 4 THEN 'MEDIUM'
        ELSE 'LOW'
    END AS risk_segment
FROM user_score;

Калибровка

Verify rules predict actual churn:

WITH historical_scores AS (
    SELECT user_id, churn_risk_score AS score_30d_ago
    FROM user_score_snapshots
    WHERE snapshot_date = CURRENT_DATE - INTERVAL '30 days'
),
churn_outcomes AS (
    SELECT user_id, churned_at IS NOT NULL AS churned
    FROM users
    WHERE created_at < CURRENT_DATE - INTERVAL '30 days'
)
SELECT
    NTILE(10) OVER (ORDER BY score_30d_ago) AS risk_decile,
    COUNT(*) AS users,
    SUM(CASE WHEN churned THEN 1 ELSE 0 END) AS churned,
    SUM(CASE WHEN churned THEN 1 ELSE 0 END)::NUMERIC * 100 / COUNT(*) AS churn_rate_pct
FROM historical_scores
JOIN churn_outcomes USING (user_id)
GROUP BY 1
ORDER BY 1;

Top decile должен иметь significantly higher churn rate чем bottom.

Закрепи формулу churn risk score в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать churn risk score в Telegram

Targeting

High-risk → retention email:

SELECT
    user_id,
    email,
    churn_risk_score,
    days_inactive
FROM user_scores
WHERE churn_risk_score >= 70
  AND email_opt_in = TRUE
ORDER BY churn_risk_score DESC
LIMIT 1000;

Частые ошибки

Ошибка 1. Score too sensitive. Everyone «high risk» — useless. Calibrate thresholds.

Ошибка 2. Static rules. World changes. Re-calibrate quarterly.

Ошибка 3. Score без validation. Verify high-score actually churns. Else just guess.

Ошибка 4. Single signal. Days inactive alone weak. Composite multiple signals.

Ошибка 5. Survivor bias. Train on completed cohorts only. Right-censored handling needed.

Связанные темы

FAQ

Rule-based vs ML?

ML — higher accuracy, complex maintenance. Rules — interpretable, faster start.

Какой threshold high-risk?

Calibrate: top decile (or top 20%) — high risk typically.

Re-score frequency?

Daily для high-velocity. Weekly для stable products.

Signals что использовать?

Days inactive, engagement decline, support tickets, plan downgrades, feature usage drop.

Score → action?

Critical → CSM call. High → discount / promo email. Medium → re-engagement automated.