Как отличить корреляцию от причинности
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
«Пользователи premium имеют retention 80%, free — 30%. Давайте всем дадим premium!». Типичная ошибка junior'а. На собесе senior-аналитика этот вопрос — bread and butter.
Путаница correlation / causation — одна из самых дорогих analytical mistakes в бизнесе.
Короткое правило
Correlation ≠ Causation.
Пример: продажи мороженого коррелируют с случаями утопления. Это не значит, что мороженое вызывает утопление (common cause: лето).
Три fit explanations
Если вижу correlation X и Y, возможны:
1. X → Y (causation)
X действительно вызывает Y.
2. Y → X (reverse causation)
Y вызывает X. (Обманчиво)
3. Z → X and Z → Y (confounding)
Общая причина Z. Classic.
4. Chance
Пустая correlation (особенно малая выборка).
Пример confounding
Observation: smokers больше exercise → reject hypothesis «smoking unhealthy»?
No. Confounder: age.
- Young people: smoke more, exercise more
- Older: smoke less, exercise less
Simpson's paradox.
Пример reverse causation
Observation: rich people имеют good health.
Causation: being rich → better health? Or good health → more earnings?
Both plausible. Без experiment cannot tell.
Tests for causation
1. RCT (randomized controlled trial)
Gold standard:
- Randomly assign treatment
- Measure outcome
- Difference = causal effect
A/B-тесты — RCT в web.
2. Natural experiments
Quasi-random events (policy change, lottery).
Diff-in-diff, regression discontinuity.
3. Instrumental variables
Variable влияет на treatment но не на outcome напрямую.
4. Causal diagrams (DAGs)
Formal causal models. Pearl's causal framework.
5. Temporal precedence
Cause должен precede effect.
Но temporal не sufficient.
Real-world примеры
«Premium users retain»
Correlation: obvious.
Causation? Maybe. Или:
- Engaged users (confounder) choose premium И retain
- Без premium они retain anyway
Test: A/B offer premium. If retention ↑ — causal.
«Users who add friends retain»
Aha moment для Facebook: 7 friends в 10 дней.
Causation? Meta tested: force-suggest friends. If new users who get friends auto retained — causal.
«Support chats increase NPS»
Correlation: users кто contacted support have higher NPS (surprising — most companies see opposite).
Causation? Maybe support fixed issues → loyalty. Или only happy users contact support (selection).
Test: random offer support → measure NPS difference.
Confounder handling
Stratification
Split by confounder:
Premium effect on retention:
- Highly engaged users: +2%
- Low engaged: +5%Controlled for engagement.
Regression
Control за confounders:
retention ~ is_premium + engagement + tenure + ...Coefficient on is_premium — controlled effect.
Matching
Propensity score — match users similar on confounders, compare outcomes.
В analytics workflow
1. Be skeptical
«X correlates с Y» → don't conclude «X causes Y».
2. List alternative explanations
Confounders? Reverse? Selection? Chance?
3. If possible — experiment
A/B if feasible.
4. If not — observational methods
Causal inference techniques.
5. Communicate uncertainty
«Data suggests X improves Y, но causal not confirmed».
Частые ошибки
Confident causation без experiment
«Наши users who use feature X retain 30% better. Ship feature X everywhere».
Maybe. Or selection. Test.
Over-interpreting single study
Never rely on one analysis. Replicate, triangulate.
Ignore causal diagram
Not drawing DAG → miss confounders.
Forget selection
Who's in sample? Biased selection → biased conclusions.
На собесе
«X correlates с Y. Causal?» Ask: confounders? Reverse? Selection? If unsure — A/B.
«Пример confounding?» Ice cream sales / drowning, age / both.
«Как test causal?» A/B gold standard. Observational — IV, DiD, matching.
«Can't A/B?» Observational methods с explicit assumptions.
Стандартные correlational fallacies
Correlation 0.9 — strong causal?
Strong correlation. Causal — separate question.
Large N — causal?
Just means correlation statistically significant. Не causal.
Expert agrees?
Credentialism fallacy. Science, not authority.
Связанные темы
FAQ
Big data решает?
No. Big data — bigger correlations. Causal requires experimental design.
Strong correlation — suspicious?
Sometimes yes (if causal not intuitive). Investigate.
Только A/B?
Best. Но many situations — observational + proper methods.
Тренируйте causal — откройте тренажёр с 1500+ вопросами для собесов.