Как проводить Root Cause Analysis
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
«Метрика упала» — easy observation. «Почему?» — hard. Без RCA — fix symptom, не cause. Same problem returns.
Senior-аналитики известны за getting к root causes. Это differentiating skill.
Что такое RCA
Root Cause Analysis — systematic process identifying underlying cause of problem.
Goal: fix root, не symptoms.
5 Whys
Toyota-развитая method. Ask «why» 5 times.
Пример
Problem: Conversion упал 20%.
- Why? → Mobile users converting меньше.
- Why? → Checkout flow медленнее на mobile.
- Why? → New payment integration adds latency.
- Why? → Not tested on slow connections.
- Why? → QA не включает slow network simulation.
Root cause: QA process gap. Fix QA → prevent future issues.
Не всегда 5
Sometimes 3, sometimes 7. «Why» until reach actionable root.
Fishbone (Ishikawa) diagram
Visual RCA для complex problems.
Categories (классика 6M):
- Man (people)
- Method
- Machine
- Material
- Measurement
- Mother nature (environment)
Each — list potential causes.
Диаграмма looks like fish skeleton.
Широкий картирование — then investigate каждую branch.
Pareto
80/20 principle: 80% problems from 20% causes.
Rank causes по frequency / impact. Focus top 20%.
Example: 10 types user complaints. Top 3 account for 70%. Fix those первые.
Data-driven RCA
Для аналитика — data-based:
1. Define problem quantitatively
«CR упал от 10% к 7.2%» — specific.
2. Scope
When started? Who affected?
3. Segment
Platform, country, user type, channel.
Which segment drive drop?
4. Correlate
What else changed?
- Releases
- External events
- Seasonality
5. Hypothesize
Based on data + segmentation — generate hypotheses.
6. Test
Data queries, small experiments.
7. Confirm
Validate root cause c evidence.
Пример data-driven RCA
Problem: DAU упал 15% за последнюю неделю.
Step 1: Verify
Is drop real? Not data pipeline issue?
Check raw events → yes, real drop.
Step 2: Timeline
Started April 20. Before — normal.
Step 3: Scope
- iOS: -25%
- Android: -5%
- Web: 0%
→ Mostly iOS issue.
Step 4: Correlate
April 19 — iOS app release v5.0.
Step 5: Hypotheses
- Bug в release
- Feature removed users loved
- Upgrade friction
Step 6: Test
- Look at error rates → spike crashes
- Most crashes в specific flow
Hypothesis 1 confirmed.
Step 7: Fix
Hotfix for crash. Monitor recovery.
Techniques
Fault tree analysis
Top-down. Start «event happened», work back к root causes через AND/OR logic.
Used safety-critical fields (aerospace, medicine). Less common analytics.
Pareto + 5 Whys
Combine: find top category (Pareto), dig deep (5 Whys).
Is / is not
- Is: where, when, what problem occurs
- Is not: where, when, what NOT
Contrast finds specific cause.
Common pitfalls
Stop too early
First cause found — fix. But не root.
Usually deeper cause.
Blame people
«User error». Usually systemic fault (bad UX, wrong training).
Ignore data
Gut feel about cause. Verify с data.
Single cause
Complex problems — multiple causes combined.
Recency bias
«Recent change = cause». Maybe. Verify.
Documentation
RCA должен документироваться:
- Problem description
- Investigation timeline
- Data collected
- Hypotheses tested
- Root cause found
- Fix implemented
- Prevention going forward
For future reference, team learning.
Post-mortem culture
Teams conduct RCA after issues:
- Blameless
- Systemic focus
- Learning over blame
- Published findings
Preventive actions
RCA → fix root cause → prevent recurrence.
But also:
- Early warning signals
- Guardrails / alerts
- Monitoring improvements
Вне data
RCA applies к:
- Code bugs
- Process failures
- Customer complaints
- Team dysfunction
Analyst can contribute беyond pure data.
На собесе
«X metric упал. Investigate».
Walk through RCA:
- Verify
- Scope (timeline, segments)
- Correlate (what else changed)
- Hypothesize
- Test
- Validate root cause
- Recommend fix
Structured approach > ad-hoc.
Связанные темы
- Как найти причину падения метрики
- Как отличить сигнал от шума
- Correlation vs causation
- Кейсы падений метрик
FAQ
Always найти root cause?
Sometimes «unknown». Best effort matters.
Single technique?
Combine: Pareto → 5 Whys → fishbone для complex.
Team или solo?
Complex — team. Simple — solo.
Тренируйте аналитические skills — откройте тренажёр с 1500+ вопросами для собесов.