Как проводить Root Cause Analysis

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

«Метрика упала» — easy observation. «Почему?» — hard. Без RCA — fix symptom, не cause. Same problem returns.

Senior-аналитики известны за getting к root causes. Это differentiating skill.

Что такое RCA

Root Cause Analysis — systematic process identifying underlying cause of problem.

Goal: fix root, не symptoms.

5 Whys

Toyota-развитая method. Ask «why» 5 times.

Пример

Problem: Conversion упал 20%.

  • Why? → Mobile users converting меньше.
  • Why? → Checkout flow медленнее на mobile.
  • Why? → New payment integration adds latency.
  • Why? → Not tested on slow connections.
  • Why? → QA не включает slow network simulation.

Root cause: QA process gap. Fix QA → prevent future issues.

Не всегда 5

Sometimes 3, sometimes 7. «Why» until reach actionable root.

Fishbone (Ishikawa) diagram

Visual RCA для complex problems.

Categories (классика 6M):

  • Man (people)
  • Method
  • Machine
  • Material
  • Measurement
  • Mother nature (environment)

Each — list potential causes.

Диаграмма looks like fish skeleton.

Широкий картирование — then investigate каждую branch.

Pareto

80/20 principle: 80% problems from 20% causes.

Rank causes по frequency / impact. Focus top 20%.

Example: 10 types user complaints. Top 3 account for 70%. Fix those первые.

Data-driven RCA

Для аналитика — data-based:

1. Define problem quantitatively

«CR упал от 10% к 7.2%» — specific.

2. Scope

When started? Who affected?

3. Segment

Platform, country, user type, channel.

Which segment drive drop?

4. Correlate

What else changed?

  • Releases
  • External events
  • Seasonality

5. Hypothesize

Based on data + segmentation — generate hypotheses.

6. Test

Data queries, small experiments.

7. Confirm

Validate root cause c evidence.

Пример data-driven RCA

Problem: DAU упал 15% за последнюю неделю.

Step 1: Verify

Is drop real? Not data pipeline issue?

Check raw events → yes, real drop.

Step 2: Timeline

Started April 20. Before — normal.

Step 3: Scope

  • iOS: -25%
  • Android: -5%
  • Web: 0%

→ Mostly iOS issue.

Step 4: Correlate

April 19 — iOS app release v5.0.

Step 5: Hypotheses

  1. Bug в release
  2. Feature removed users loved
  3. Upgrade friction

Step 6: Test

  • Look at error rates → spike crashes
  • Most crashes в specific flow

Hypothesis 1 confirmed.

Step 7: Fix

Hotfix for crash. Monitor recovery.

Techniques

Fault tree analysis

Top-down. Start «event happened», work back к root causes через AND/OR logic.

Used safety-critical fields (aerospace, medicine). Less common analytics.

Pareto + 5 Whys

Combine: find top category (Pareto), dig deep (5 Whys).

Is / is not

  • Is: where, when, what problem occurs
  • Is not: where, when, what NOT

Contrast finds specific cause.

Common pitfalls

Stop too early

First cause found — fix. But не root.

Usually deeper cause.

Blame people

«User error». Usually systemic fault (bad UX, wrong training).

Ignore data

Gut feel about cause. Verify с data.

Single cause

Complex problems — multiple causes combined.

Recency bias

«Recent change = cause». Maybe. Verify.

Documentation

RCA должен документироваться:

  • Problem description
  • Investigation timeline
  • Data collected
  • Hypotheses tested
  • Root cause found
  • Fix implemented
  • Prevention going forward

For future reference, team learning.

Post-mortem culture

Teams conduct RCA after issues:

  • Blameless
  • Systemic focus
  • Learning over blame
  • Published findings

Preventive actions

RCA → fix root cause → prevent recurrence.

But also:

  • Early warning signals
  • Guardrails / alerts
  • Monitoring improvements

Вне data

RCA applies к:

  • Code bugs
  • Process failures
  • Customer complaints
  • Team dysfunction

Analyst can contribute беyond pure data.

На собесе

«X metric упал. Investigate».

Walk through RCA:

  1. Verify
  2. Scope (timeline, segments)
  3. Correlate (what else changed)
  4. Hypothesize
  5. Test
  6. Validate root cause
  7. Recommend fix

Structured approach > ad-hoc.

Связанные темы

FAQ

Always найти root cause?

Sometimes «unknown». Best effort matters.

Single technique?

Combine: Pareto → 5 Whys → fishbone для complex.

Team или solo?

Complex — team. Simple — solo.


Тренируйте аналитические skills — откройте тренажёр с 1500+ вопросами для собесов.