Шпаргалка по cohorts и retention
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Cohort analysis + retention — foundation продуктовой аналитики. Без них — нет понимания health продукта.
На собесах типа «напишите SQL для retention matrix» — ожидаемо.
Что такое cohort
Group пользователей, shares characteristic + time.
Примеры:
- Signup month
- Acquisition channel
- Product version first used
Cohort analysis
Analyze behavior groups over time.
Output: cohort retention matrix:
| Cohort | M0 | M1 | M2 | M3 |
|---|---|---|---|---|
| Jan | 100% | 45% | 30% | 25% |
| Feb | 100% | 50% | 32% | 27% |
Each row — cohort, columns — offset.
Retention types
Classic retention
% users active в specific day/week/month после acquisition.
D1 / D7 / D30
Specific day. Mobile app classic.
N-day retention
Active precisely на day N.
Rolling retention
Active в day N или any later.
Bracket retention
Active в range [N-k, N+k].
Each — different picture.
Which to use
Subscription
Monthly: billed? Still active?
Mobile
D1 / D7 / D30 standard.
B2B SaaS
Weekly / monthly.
E-commerce
Depends. Purchase retention, visit retention differ.
SQL retention template
Step 1: Cohort
WITH cohorts AS (
SELECT user_id, DATE_TRUNC('month', MIN(created_at)) AS cohort_month
FROM users
GROUP BY user_id
)Step 2: Activity
, activity AS (
SELECT DISTINCT user_id, DATE_TRUNC('month', created_at) AS active_month
FROM events
)Step 3: Join, compute offset
SELECT
c.cohort_month,
EXTRACT(MONTH FROM AGE(a.active_month, c.cohort_month)) +
EXTRACT(YEAR FROM AGE(a.active_month, c.cohort_month)) * 12 AS months_since,
COUNT(DISTINCT a.user_id) AS active
FROM cohorts c
JOIN activity a USING (user_id)
GROUP BY 1, 2;Step 4: Normalize
, cohort_sizes AS (
SELECT cohort_month, COUNT(*) AS size FROM cohorts GROUP BY 1
)
SELECT
r.*,
r.active * 100.0 / cs.size AS retention_pct
FROM retention r
JOIN cohort_sizes cs USING (cohort_month);Важные pitfalls
Left-censoring
Users with missing early data — excluded?
Right-censoring
Users signed recently — don't have future data.
Separate analysis.
Cohort sizes
Newer cohorts smaller. Interpret carefully.
Survivorship bias
Old cohorts filtered — only long-term survivors.
Retention curve
Plot all cohorts together:
- Average curve
- Individual cohort curves
Shape:
- Smiley: some users come back later (good)
- Frowning: steady drop, no return (bad)
- Flat: eventually stabilize
Goal: flat plateau > 0%.
Retention по segments
-- По платформе
SELECT
platform,
cohort_month,
months_since,
active_pct
FROM retention_segmented
WHERE platform IN ('iOS', 'Android', 'Web');iOS vs Android vs Web — different patterns often.
Cohort vs A/B
A/B test
Compare two groups randomized. Cohort size similar.
Cohort analysis
Sequential cohorts (January, February). Not randomized — trends, effects.
Predicting retention
Survival analysis
Cox Proportional Hazards. Predict time-to-churn.
ML models
Features → predicted retention.
Bass diffusion
Classical forecasting new product adoption.
Product impact
Retention — lagging indicator.
Leading indicators:
- Activation
- Early engagement
- First purchase
Improve → retention follows.
A/B retention
D7 retention A/B
Quick. Treatment vs control D7.
D30 retention A/B
Long. Often need month+ data.
Cumulative metric
«Users retained в 30 days» — single number per user.
T-test or chi-square на proportions.
В tools
Amplitude
Built-in cohort retention visualization.
Mixpanel
Classic retention reports.
Custom SQL
Most flexible. See above.
Новая cohort
Ship feature → new cohort. Compare их retention к old cohorts.
Better? → feature positive.
Worse? → concerning.
Careful с confounders (seasonality, concurrent changes).
Cohort LTV
LTV на cohort:
WITH cohort_spend AS (
SELECT
cohort_month,
user_id,
SUM(amount) AS total_spent
FROM users
JOIN orders USING (user_id)
WHERE order_date <= signup_date + INTERVAL '365 days'
GROUP BY 1, 2
)
SELECT
cohort_month,
AVG(total_spent) AS avg_ltv_365d
FROM cohort_spend
GROUP BY cohort_month;Track LTV changes cohort to cohort — key growth signal.
На собесе
«Напишите SQL retention matrix».
Walk: cohort CTE → activity CTE → offset → aggregate → normalize.
«Cohort vs A/B?» Cohort — sequential groups observed. A/B — random assigned.
«Retention D7 = 30%. Хорошо?» Depends на industry. Mobile casual game ok. SaaS poor.
«Retention = проблема. Что делать?» Segment → find worst → investigate causes → A/B interventions.
Частые ошибки
Daily cohort на low-traffic
10 users в day → noisy. Aggregate weekly.
Ignore survivors
Old cohorts — только successful. Current data compared wrongly.
Mix retention types
«Cohort 1 D7, Cohort 2 D30» — incomparable. Standardize.
Not adjusting
Seasonality, promotions change retention. Adjust или acknowledge.
Связанные темы
- SQL для когортного анализа
- Как считать retention
- Задачи на cohort analysis
- Retention простыми словами
- Cohort analysis
FAQ
Cohort size минимум?
100+ для meaningful stats. Less — noise.
Monthly или weekly?
Monthly SaaS, weekly high-traffic, daily mobile games.
Как long track?
At least 3x expected churn timeframe. E.g., quarterly churn → 1 year cohort.
Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.