Cohort analysis на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Зачем разбирать на собесе
Cohort analysis — стандарт product analytics. На собесе DS: «как считать retention», «зачем cohorts».
Что такое cohort
Group of users sharing characteristic — обычно signup date.
Acquisition cohort. Все, кто signed up в week / month X.
Behavioral cohort. Все, кто did action X (made purchase, completed onboarding).
Зачем cohort:
- Tracking metric over time без survivor bias.
- Сравнение cohorts — improving / degrading?
- Понимание product changes impact.
Retention cohort
Most common.
Cohort: users signed up in May 2026
Day 0: 100% (all 1000 users)
Day 1: 60%
Day 7: 40%
Day 30: 25%
Day 90: 18%SQL:
WITH cohorts AS (
SELECT user_id, DATE_TRUNC('week', signup_date) AS cohort_week
FROM users
),
retention AS (
SELECT c.cohort_week,
(e.event_date - c.cohort_week) / 7 AS week_n,
COUNT(DISTINCT e.user_id) AS active
FROM cohorts c
JOIN events e ON e.user_id = c.user_id AND e.event_date >= c.cohort_week
GROUP BY 1, 2
)
SELECT cohort_week, week_n,
100.0 * active / FIRST_VALUE(active) OVER (PARTITION BY cohort_week ORDER BY week_n)
FROM retention;Behavioral cohort
Users defined by action, не sign-up.
Examples:
- Users who made first purchase в May 2026.
- Users who completed onboarding tutorial.
- Users who upgraded to premium.
Сравнение их retention с control (didn't do action).
Triangle visualization
Standard cohort table:
Cohort | Day 0 | Day 1 | Day 7 | Day 30 | Day 90
2026-Apr | 100% | 65% | 42% | 28% | 20%
2026-May | 100% | 68% | 45% | 30% | --
2026-Jun | 100% | 70% | 47% | -- | --
2026-Jul | 100% | 72% | -- | -- | --Reading: Down — newer cohorts. Right — time since acquisition. Empty — еще не дожили.
Тренды:
- Newer cohorts retain better (good — improvements working).
- Newer cohorts drop faster (bad — quality customer worse).
- Same retention — stagnation.
Связанные темы
- Survival analysis на собесе DS
- Causal inference для DS
- Funnel conversion для PM
- Активation rate для PM
- Подготовка к собесу Data Scientist
FAQ
Как handle censoring?
Newest cohorts не дожили to day 30 — empty cells. Отдельно от 0% retention.
Это официальная информация?
Нет. Статья основана на стандартных подходах product analytics.
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.