Retention и cohort на собесе маркетинг-аналитика

Зачем retention и cohort на собесе

Retention — главная метрика sustainability продукта. На собесе маркетинг-аналитика retention спрашивают и в SQL-кейсах, и в продуктовых вопросах. Без сильной retention acquisition = leaky bucket.

Слабый ответ — «считаю % active». Сильный — про cohort-based retention, retention curves shape, churn analysis, segmentation.

Виды retention

N-day retention: D1, D7, D30 — % users, активных через N дней после signup.

Rolling retention: % users, активных за last N days.

Range retention: % users, активных в week 4 (e.g., days 22-28).

Resurrection: % churned users, returned.

В Карьернике: D7 (был ли user через неделю после signup), M1 (месячный), M3 (квартальный).

Cohort retention table

Стандартный вид:

Cohort M0 M1 M2 M3 M6
Jan'24 100% 40% 30% 25% 20%
Feb'24 100% 42% 32% 27% 21%
Mar'24 100% 45% 35% 30% ?
Apr'24 100% 38% 28% 25% ?

Что смотреть:

  • Improving (newer cohorts retain better) — product getting better
  • Stable
  • Deteriorating — что-то сломали

Retention curve shape

Hockey stick (стабилизация): падает быстро, потом стабилизируется. Healthy. Bottom часть — long-term retention.

Smile curve: падает, потом growing (resurrection / re-engagement). Reactivation работает.

Cliff: падает к 0. Что-то fundamental wrong.

No curve (flat at 100%): что-то tracking сломано.

Cohort retention в SQL

WITH cohorts AS (
  SELECT user_id, DATE_TRUNC('month', MIN(event_time)) AS cohort_month
  FROM events WHERE event_name = 'signup'
  GROUP BY user_id
),
activity AS (
  SELECT c.cohort_month, e.user_id,
    EXTRACT(YEAR FROM AGE(DATE_TRUNC('month', e.event_time), c.cohort_month)) * 12 +
    EXTRACT(MONTH FROM AGE(DATE_TRUNC('month', e.event_time), c.cohort_month)) AS months_since
  FROM cohorts c JOIN events e USING (user_id)
  WHERE e.event_name = 'session_start'
),
sizes AS (
  SELECT cohort_month, COUNT(DISTINCT user_id) AS cohort_size
  FROM cohorts GROUP BY 1
)
SELECT a.cohort_month, a.months_since,
  COUNT(DISTINCT a.user_id) * 1.0 / s.cohort_size AS retention
FROM activity a JOIN sizes s USING (cohort_month)
GROUP BY 1, 2, s.cohort_size
ORDER BY 1, 2;

Churn rate

Простой: (users_start_month - users_end_month + new_users) / users_start_month. Wrong для растущих.

Logo churn: % customers ушли. Standard для SaaS.

Revenue churn (gross / net):

  • Gross: revenue lost from churn / starting revenue
  • Net: gross - expansion revenue
  • Net < 0 (= expansion больше churn) — лучший показатель

RFM segmentation

Recency — давно ли user был активен Frequency — как часто покупает Monetary — сколько тратит

Скоры 1-5 на каждой dimension → 5×5×5 = 125 сегментов.

Use cases:

  • «Champions» (5/5/5): VIP treatment
  • «At risk» (1/3/4): win-back campaign
  • «Lost» (1/1/1): reactivation

В SQL:

SELECT user_id,
  NTILE(5) OVER (ORDER BY DATEDIFF(today, last_purchase_date) DESC) AS r_score,
  NTILE(5) OVER (ORDER BY orders_count) AS f_score,
  NTILE(5) OVER (ORDER BY total_revenue) AS m_score
FROM user_summary;

Causal analysis of churn

Why users churn?

Approach:

  1. Define churn (no activity 30 days)
  2. Sample churned users + control (still active)
  3. Look at activity differences before churn moment
  4. Hypothesis: missing feature usage, bug, support issue
  5. Validate qualitative (interviews) + quantitative

Common churn drivers:

  • Onboarding не завершён
  • Activation milestone не reached
  • Feature не used
  • Support ticket unresolved
  • Pricing change

Survival analysis

Cox proportional hazards для retention modeling.

  • Hazard function: instantaneous churn rate
  • Covariates: features (engagement, plan, etc.) → effect on retention
  • Kaplan-Meier curves для cohort comparison

Tools: lifelines (Python), survminer (R).

Типичные вопросы

«D7 retention 30%. Хорошо?»

Depends. Game / social: 40%+ норма. SaaS / utility: 60%+. Если 30% для SaaS — flag.

«Cohort 2024-Q4 retention dropped. Investigate.»

  1. Acquisition source — изменилось? Новые каналы с худшей audience?
  2. Product change — что выкатили?
  3. Seasonality — Q4 шопинг шум?
  4. Tracking — не сломалось ли?

«Net Revenue Retention 110%. Что это значит?»

Existing customers expansion (upsell + price increase) больше, чем churn. Holy grail SaaS — > 100% = revenue растёт без новых customers.

«User не active 7 days. Churned?»

Depends on usage pattern. Daily product → возможно. Monthly product → нет. Define churn relative к expected frequency.

Частые ошибки

  • Cohort by signup date, but treating as cohort by acquisition source. Different cohorts.
  • Daily / weekly mixed. Stick к one cadence.
  • Without normalize cohort size. Сравниваем absolute counts (big cohort просто больше).
  • D1 без context. Game D1 80%, SaaS B2B 40% — normal.
  • Churn formula naive. Растущие companies — formula breaks.

FAQ

D vs M retention?

D — для high-frequency products (game, social, ride-share). M — для subscription, B2B.

Cohort by signup OR by first transaction?

Зависит от продукта. Карьерник: signup-based (free → eventually paid). E-com: first-transaction.

Какие tools для cohort?

PostHog / Amplitude / Mixpanel — built-in. Tableau / DataLens / Looker — на DWH. SQL — самое flexible.

Predicting churn — нужно ли?

Для outreach («at-risk users» → campaign). ML model (churn probability) — middle / senior DS / DA задача.

Retention vs engagement?

Retention — bool (came back?). Engagement — degree (how much used?). Both matter.

Смотрите также