Шпаргалка по cohorts и retention

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Cohort analysis + retention — foundation продуктовой аналитики. Без них — нет понимания health продукта.

На собесах типа «напишите SQL для retention matrix» — ожидаемо.

Что такое cohort

Group пользователей, shares characteristic + time.

Примеры:

  • Signup month
  • Acquisition channel
  • Product version first used

Cohort analysis

Analyze behavior groups over time.

Output: cohort retention matrix:

Cohort M0 M1 M2 M3
Jan 100% 45% 30% 25%
Feb 100% 50% 32% 27%

Each row — cohort, columns — offset.

Retention types

Classic retention

% users active в specific day/week/month после acquisition.

D1 / D7 / D30

Specific day. Mobile app classic.

N-day retention

Active precisely на day N.

Rolling retention

Active в day N или any later.

Bracket retention

Active в range [N-k, N+k].

Each — different picture.

Which to use

Subscription

Monthly: billed? Still active?

Mobile

D1 / D7 / D30 standard.

B2B SaaS

Weekly / monthly.

E-commerce

Depends. Purchase retention, visit retention differ.

SQL retention template

Step 1: Cohort

WITH cohorts AS (
    SELECT user_id, DATE_TRUNC('month', MIN(created_at)) AS cohort_month
    FROM users
    GROUP BY user_id
)

Step 2: Activity

, activity AS (
    SELECT DISTINCT user_id, DATE_TRUNC('month', created_at) AS active_month
    FROM events
)

Step 3: Join, compute offset

SELECT
    c.cohort_month,
    EXTRACT(MONTH FROM AGE(a.active_month, c.cohort_month)) +
        EXTRACT(YEAR FROM AGE(a.active_month, c.cohort_month)) * 12 AS months_since,
    COUNT(DISTINCT a.user_id) AS active
FROM cohorts c
JOIN activity a USING (user_id)
GROUP BY 1, 2;

Step 4: Normalize

, cohort_sizes AS (
    SELECT cohort_month, COUNT(*) AS size FROM cohorts GROUP BY 1
)
SELECT
    r.*,
    r.active * 100.0 / cs.size AS retention_pct
FROM retention r
JOIN cohort_sizes cs USING (cohort_month);

Важные pitfalls

Left-censoring

Users with missing early data — excluded?

Right-censoring

Users signed recently — don't have future data.

Separate analysis.

Cohort sizes

Newer cohorts smaller. Interpret carefully.

Survivorship bias

Old cohorts filtered — only long-term survivors.

Retention curve

Plot all cohorts together:

  • Average curve
  • Individual cohort curves

Shape:

  • Smiley: some users come back later (good)
  • Frowning: steady drop, no return (bad)
  • Flat: eventually stabilize

Goal: flat plateau > 0%.

Retention по segments

-- По платформе
SELECT
    platform,
    cohort_month,
    months_since,
    active_pct
FROM retention_segmented
WHERE platform IN ('iOS', 'Android', 'Web');

iOS vs Android vs Web — different patterns often.

Cohort vs A/B

A/B test

Compare two groups randomized. Cohort size similar.

Cohort analysis

Sequential cohorts (January, February). Not randomized — trends, effects.

Predicting retention

Survival analysis

Cox Proportional Hazards. Predict time-to-churn.

ML models

Features → predicted retention.

Bass diffusion

Classical forecasting new product adoption.

Product impact

Retention — lagging indicator.

Leading indicators:

  • Activation
  • Early engagement
  • First purchase

Improve → retention follows.

A/B retention

D7 retention A/B

Quick. Treatment vs control D7.

D30 retention A/B

Long. Often need month+ data.

Cumulative metric

«Users retained в 30 days» — single number per user.

T-test or chi-square на proportions.

В tools

Amplitude

Built-in cohort retention visualization.

Mixpanel

Classic retention reports.

Custom SQL

Most flexible. See above.

Новая cohort

Ship feature → new cohort. Compare их retention к old cohorts.

Better? → feature positive.

Worse? → concerning.

Careful с confounders (seasonality, concurrent changes).

Cohort LTV

LTV на cohort:

WITH cohort_spend AS (
    SELECT
        cohort_month,
        user_id,
        SUM(amount) AS total_spent
    FROM users
    JOIN orders USING (user_id)
    WHERE order_date <= signup_date + INTERVAL '365 days'
    GROUP BY 1, 2
)
SELECT
    cohort_month,
    AVG(total_spent) AS avg_ltv_365d
FROM cohort_spend
GROUP BY cohort_month;

Track LTV changes cohort to cohort — key growth signal.

На собесе

«Напишите SQL retention matrix».

Walk: cohort CTE → activity CTE → offset → aggregate → normalize.

«Cohort vs A/B?» Cohort — sequential groups observed. A/B — random assigned.

«Retention D7 = 30%. Хорошо?» Depends на industry. Mobile casual game ok. SaaS poor.

«Retention = проблема. Что делать?» Segment → find worst → investigate causes → A/B interventions.

Частые ошибки

Daily cohort на low-traffic

10 users в day → noisy. Aggregate weekly.

Ignore survivors

Old cohorts — только successful. Current data compared wrongly.

Mix retention types

«Cohort 1 D7, Cohort 2 D30» — incomparable. Standardize.

Not adjusting

Seasonality, promotions change retention. Adjust или acknowledge.

Связанные темы

FAQ

Cohort size минимум?

100+ для meaningful stats. Less — noise.

Monthly или weekly?

Monthly SaaS, weekly high-traffic, daily mobile games.

Как long track?

At least 3x expected churn timeframe. E.g., quarterly churn → 1 year cohort.


Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.