SQL для сегментации пользователей
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Сегментация users — базовая задача marketing / product. Кого retarget, кому email, кому premium offer. Аналитик без segmentation skills — ограничен.
На собесах в retail / e-commerce / subscription businesses segmentation — bread and butter.
Типы сегментации
1. Demographic
Age, gender, location, company size (B2B).
2. Behavioral
Actions, frequency, recency. Most actionable.
3. RFM
Recency, Frequency, Monetary — classic для e-commerce.
4. Psychographic
Interests, lifestyle. Harder в quantify.
5. Tier-based
Free / Basic / Premium. Straightforward.
RFM в SQL
WITH rfm_raw AS (
SELECT
user_id,
MAX(created_at) AS last_purchase,
COUNT(*) AS frequency,
SUM(amount) AS monetary
FROM orders
WHERE created_at >= CURRENT_DATE - 365
GROUP BY user_id
),
rfm_scores AS (
SELECT
user_id,
NTILE(5) OVER (ORDER BY last_purchase DESC) AS r_score,
NTILE(5) OVER (ORDER BY frequency DESC) AS f_score,
NTILE(5) OVER (ORDER BY monetary DESC) AS m_score
FROM rfm_raw
)
SELECT
user_id,
r_score, f_score, m_score,
r_score * 100 + f_score * 10 + m_score AS rfm_code,
CASE
WHEN r_score = 5 AND f_score = 5 AND m_score = 5 THEN 'Champions'
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Loyal'
WHEN r_score = 5 AND f_score <= 2 THEN 'New Customers'
WHEN r_score <= 2 AND f_score >= 4 THEN 'At Risk'
WHEN r_score = 1 THEN 'Churned'
ELSE 'Normal'
END AS segment
FROM rfm_scores;Behavioral
Power user
SELECT
user_id,
CASE
WHEN sessions_30d >= 20 AND minutes_30d >= 300 THEN 'Power'
WHEN sessions_30d >= 10 THEN 'Regular'
WHEN sessions_30d >= 3 THEN 'Occasional'
ELSE 'Casual'
END AS segment
FROM (
SELECT
user_id,
COUNT(*) AS sessions_30d,
SUM(duration_min) AS minutes_30d
FROM sessions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY user_id
) s;Engagement tiers
Percentile-based:
SELECT
user_id,
NTILE(4) OVER (ORDER BY engagement_score DESC) AS tier
FROM (
SELECT user_id, SUM(actions_count) AS engagement_score
FROM events
WHERE created_at >= CURRENT_DATE - 30
GROUP BY user_id
) s;1 = top 25%, 4 = bottom 25%.
New vs returning
SELECT
user_id,
CASE
WHEN DATE_DIFF('day', signup_at, CURRENT_DATE) <= 30 THEN 'New'
WHEN last_active >= CURRENT_DATE - 30 THEN 'Active Returning'
WHEN last_active >= CURRENT_DATE - 90 THEN 'At Risk'
ELSE 'Churned'
END AS lifecycle_stage
FROM users;Premium / Free
SELECT
user_id,
CASE
WHEN current_plan = 'premium' THEN 'Premium'
WHEN free_trial_active THEN 'Trial'
WHEN premium_history THEN 'Former Premium'
ELSE 'Free'
END AS subscription_segment
FROM users;Combined segments
WITH segments AS (
SELECT
user_id,
rfm_segment,
behavior_segment,
lifecycle_stage
FROM segmentation_combined
)
SELECT
rfm_segment,
lifecycle_stage,
COUNT(*) AS users,
AVG(ltv) AS avg_ltv
FROM segments
GROUP BY rfm_segment, lifecycle_stage
ORDER BY 1, 2;Crosstab shows interactions.
Clustering через SQL (approximation)
Simple K-means-like через NTILE:
WITH features AS (
SELECT
user_id,
NTILE(3) OVER (ORDER BY recency DESC) AS r,
NTILE(3) OVER (ORDER BY frequency DESC) AS f,
NTILE(3) OVER (ORDER BY monetary DESC) AS m
FROM rfm
)
SELECT
r, f, m,
COUNT(*) AS segment_size,
AVG(ltv) AS avg_ltv
FROM features
GROUP BY r, f, m
ORDER BY avg_ltv DESC;27 «clusters». Больше — harder.
Для proper clustering — use Python K-means.
Propensity scoring
Score по model → segment by threshold:
SELECT
user_id,
churn_score,
CASE
WHEN churn_score >= 0.8 THEN 'High Risk'
WHEN churn_score >= 0.5 THEN 'Medium Risk'
ELSE 'Low Risk'
END AS churn_segment
FROM users_with_scores;Churn_score from ML model loaded.
Actionable segments
Каждый segment → action:
- Power users: increase engagement, referrals, case studies
- New customers: welcome journey, first-purchase discount
- At Risk: re-engagement email, win-back offer
- Churned: major discount, product updates
Validation
Business sense
Segments должны быть actionable, distinct, stable.
Size distribution
5% power, 20% regular, 40% casual, 35% occasional — reasonable.
10% super-power, 0.1% at risk — might be mis-defined.
Behavior differs
Segments должны differ in KPIs significantly.
Частые ошибки
Too many segments
20 segments → ops team confused.
Keep 5-7 max.
Static definitions
Segments меняются over time. Refresh periodically.
Single dimension
«High LTV» only → miss engagement. Combine dimensions.
No business adoption
Best segmentation — none if не used. Align with marketing / CS.
На собесе
«Как бы сегментировали users для [product]?»
Walk through:
- Business goal (retention? LTV?)
- Features relevant
- Method (rule-based vs data-driven)
- Sizes
- Actions
Show product thinking.
Связанные темы
FAQ
SQL или Python?
SQL для простых. Python для ML-clustering.
Как часто обновлять?
Weekly / monthly. Слишком often — unstable для marketing workflows.
Overlap ok?
Hierarchical (tier) — no. Cross-cutting (power × premium) — often yes.
Тренируйте SQL — откройте тренажёр с 1500+ вопросами для собесов.