Retention и cohort на собесе маркетинг-аналитика
Зачем retention и cohort на собесе
Retention — главная метрика sustainability продукта. На собесе маркетинг-аналитика retention спрашивают и в SQL-кейсах, и в продуктовых вопросах. Без сильной retention acquisition = leaky bucket.
Слабый ответ — «считаю % active». Сильный — про cohort-based retention, retention curves shape, churn analysis, segmentation.
Виды retention
N-day retention: D1, D7, D30 — % users, активных через N дней после signup.
Rolling retention: % users, активных за last N days.
Range retention: % users, активных в week 4 (e.g., days 22-28).
Resurrection: % churned users, returned.
В Карьернике: D7 (был ли user через неделю после signup), M1 (месячный), M3 (квартальный).
Cohort retention table
Стандартный вид:
| Cohort | M0 | M1 | M2 | M3 | M6 |
|---|---|---|---|---|---|
| Jan'24 | 100% | 40% | 30% | 25% | 20% |
| Feb'24 | 100% | 42% | 32% | 27% | 21% |
| Mar'24 | 100% | 45% | 35% | 30% | ? |
| Apr'24 | 100% | 38% | 28% | 25% | ? |
Что смотреть:
- Improving (newer cohorts retain better) — product getting better
- Stable
- Deteriorating — что-то сломали
Retention curve shape
Hockey stick (стабилизация): падает быстро, потом стабилизируется. Healthy. Bottom часть — long-term retention.
Smile curve: падает, потом growing (resurrection / re-engagement). Reactivation работает.
Cliff: падает к 0. Что-то fundamental wrong.
No curve (flat at 100%): что-то tracking сломано.
Cohort retention в SQL
WITH cohorts AS (
SELECT user_id, DATE_TRUNC('month', MIN(event_time)) AS cohort_month
FROM events WHERE event_name = 'signup'
GROUP BY user_id
),
activity AS (
SELECT c.cohort_month, e.user_id,
EXTRACT(YEAR FROM AGE(DATE_TRUNC('month', e.event_time), c.cohort_month)) * 12 +
EXTRACT(MONTH FROM AGE(DATE_TRUNC('month', e.event_time), c.cohort_month)) AS months_since
FROM cohorts c JOIN events e USING (user_id)
WHERE e.event_name = 'session_start'
),
sizes AS (
SELECT cohort_month, COUNT(DISTINCT user_id) AS cohort_size
FROM cohorts GROUP BY 1
)
SELECT a.cohort_month, a.months_since,
COUNT(DISTINCT a.user_id) * 1.0 / s.cohort_size AS retention
FROM activity a JOIN sizes s USING (cohort_month)
GROUP BY 1, 2, s.cohort_size
ORDER BY 1, 2;Churn rate
Простой: (users_start_month - users_end_month + new_users) / users_start_month. Wrong для растущих.
Logo churn: % customers ушли. Standard для SaaS.
Revenue churn (gross / net):
- Gross: revenue lost from churn / starting revenue
- Net: gross - expansion revenue
- Net < 0 (= expansion больше churn) — лучший показатель
RFM segmentation
Recency — давно ли user был активен Frequency — как часто покупает Monetary — сколько тратит
Скоры 1-5 на каждой dimension → 5×5×5 = 125 сегментов.
Use cases:
- «Champions» (5/5/5): VIP treatment
- «At risk» (1/3/4): win-back campaign
- «Lost» (1/1/1): reactivation
В SQL:
SELECT user_id,
NTILE(5) OVER (ORDER BY DATEDIFF(today, last_purchase_date) DESC) AS r_score,
NTILE(5) OVER (ORDER BY orders_count) AS f_score,
NTILE(5) OVER (ORDER BY total_revenue) AS m_score
FROM user_summary;Causal analysis of churn
Why users churn?
Approach:
- Define churn (no activity 30 days)
- Sample churned users + control (still active)
- Look at activity differences before churn moment
- Hypothesis: missing feature usage, bug, support issue
- Validate qualitative (interviews) + quantitative
Common churn drivers:
- Onboarding не завершён
- Activation milestone не reached
- Feature не used
- Support ticket unresolved
- Pricing change
Survival analysis
Cox proportional hazards для retention modeling.
- Hazard function: instantaneous churn rate
- Covariates: features (engagement, plan, etc.) → effect on retention
- Kaplan-Meier curves для cohort comparison
Tools: lifelines (Python), survminer (R).
Типичные вопросы
«D7 retention 30%. Хорошо?»
Depends. Game / social: 40%+ норма. SaaS / utility: 60%+. Если 30% для SaaS — flag.
«Cohort 2024-Q4 retention dropped. Investigate.»
- Acquisition source — изменилось? Новые каналы с худшей audience?
- Product change — что выкатили?
- Seasonality — Q4 шопинг шум?
- Tracking — не сломалось ли?
«Net Revenue Retention 110%. Что это значит?»
Existing customers expansion (upsell + price increase) больше, чем churn. Holy grail SaaS — > 100% = revenue растёт без новых customers.
«User не active 7 days. Churned?»
Depends on usage pattern. Daily product → возможно. Monthly product → нет. Define churn relative к expected frequency.
Частые ошибки
- Cohort by signup date, but treating as cohort by acquisition source. Different cohorts.
- Daily / weekly mixed. Stick к one cadence.
- Without normalize cohort size. Сравниваем absolute counts (big cohort просто больше).
- D1 без context. Game D1 80%, SaaS B2B 40% — normal.
- Churn formula naive. Растущие companies — formula breaks.
FAQ
D vs M retention?
D — для high-frequency products (game, social, ride-share). M — для subscription, B2B.
Cohort by signup OR by first transaction?
Зависит от продукта. Карьерник: signup-based (free → eventually paid). E-com: first-transaction.
Какие tools для cohort?
PostHog / Amplitude / Mixpanel — built-in. Tableau / DataLens / Looker — на DWH. SQL — самое flexible.
Predicting churn — нужно ли?
Для outreach («at-risk users» → campaign). ML model (churn probability) — middle / senior DS / DA задача.
Retention vs engagement?
Retention — bool (came back?). Engagement — degree (how much used?). Both matter.