Как писать postmortem в аналитике

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Dashboard сломался. Metric неверно отчитался. A/B неверно интерпретирован. Иncidents в analytics — common.

Post-mortem — документ, анализирующий what happened → preventing recurrence.

Senior-culture. Часто expected от senior-analysts.

Что такое post-mortem

Docum после incident:

  • What happened
  • Timeline
  • Root cause
  • Impact
  • Preventive actions

Engineering practice, adopted by analytics teams.

Когда писать

Big incident

  • Wrong number в exec report
  • Dashboard показывал garbage для week
  • Critical A/B conclusion wrong
  • Data pipeline broken weeks

Medium

  • Metric definition change не communicated
  • Wrong insight shipped based on bad SQL
  • Important analysis took 3x expected

Small

Rarely formal post-mortem. Quick Slack update.

Структура

1. Title

Clear, factual: «2026-04-22: Revenue dashboard showed 2× actual для CEO report».

2. Summary

1-2 sentences.

«Revenue dashboard displayed doubled values from April 15-22 из-за typo in dbt model. Fixed April 22. CEO report communicated wrong trajectory».

3. Impact

Quantify если возможно:

  • Who affected
  • Decisions made based
  • Business impact estimated
  • Duration

4. Timeline

Chronological facts:

Time Event
Apr 15, 09:00 dbt deploy introduces typo
Apr 15, 10:30 Dashboard refresh, new (wrong) values
Apr 18, 14:00 CEO questions «great numbers»
Apr 22, 09:00 Analyst notices inconsistency
Apr 22, 10:00 Investigation begins
Apr 22, 14:00 Root cause identified
Apr 22, 15:00 Fix deployed
Apr 22, 16:00 Correct values confirmed

5. Root cause

Why happened. Use 5 Whys.

«dbt model introduced x * 2 instead of x в revenue calc. QA pre-deploy не catch because model не had output verification test».

6. What went well

Positive things:

  • Caught within week
  • Fix deployed fast
  • No permanent data damage

7. What went poorly

  • Not caught pre-deploy
  • Took 7 days to notice
  • Impacted exec decision

8. Action items

Preventive, specific:

  • Add test для dbt model output sanity check. Owner: X. Due: Y.
  • Setup anomaly alert for significant metric jumps. Owner: Y. Due: Z.
  • Review process deploy changes к metric models. Action: Z. Due: W.

Each action — owner + deadline.

9. Lessons learned

General takeaways для team.

Blameless

Critical principle. Постmortem — learning, не blame.

Don't

«Analyst X deployed broken code».

Do

«Model deployed without output validation. Gap в process».

Focus process, not people.

Processes vs people

People are fallible. Process should catch mistakes.

If process failed → process fix.

Если individual negligent — separate HR, не post-mortem.

Tools

Template

Use consistent template across incidents. Easy scan.

Storage

  • Notion / Confluence wiki
  • Accessible
  • Searchable

Regular review

Monthly: review post-mortems. Team learning.

Sharing

Internal

  • Team meeting
  • Post в analytics channel
  • Linked в team wiki

Cross-functional

Affects multiple teams → present к leadership. Transparency.

Execution of action items

Post-mortem только useful если actions executed.

  • Track в ticketing system (Jira)
  • Weekly review progress
  • Follow-through обязателен

Without — same incident recurs.

Culture

Safe environment

People shouldn't fear write post-mortem after mistake.

Emphasized — learning opportunity.

Celebrate

Good post-mortems — public recognition.

Read собes' others

Team-wide learning из post-mortems.

Пример (фиктивный)

2026-04-22: Quarterly revenue dashboard showed -15% instead of +8%

Summary: Stakeholders shocked by apparent -15% revenue drop в Q1 dashboard. Real number +8%. SQL error.

Impact:

  • CEO asked emergency meeting
  • CFO reviewed forecasts
  • Took 4 hours staff time
  • Dashboard used for week before fix

Timeline:

  • Apr 1: deploy new revenue model
  • Apr 2: dashboard refresh
  • Apr 22: stakeholder flags question
  • Apr 22 14:00: investigation
  • Apr 22 15:00: root cause
  • Apr 22 15:30: fix deployed

Root cause: New dbt model used amount_usd instead amount_local. USD dropped 20% vs local в Q1 → apparent revenue drop.

Actual reason: currency не translated уstream.

What went well:

  • Stakeholder courage к question data
  • Quick investigation и fix

What went poorly:

  • 3-week delay от deploy к detection
  • No anomaly alert
  • PR review missed currency handling
  • No smoke test post-deploy

Action items:

  1. Add dbt test «revenue total within 10% previous period». Owner: A. Due: Apr 29.
  2. Anomaly alert для revenue metric. Owner: B. Due: Apr 30.
  3. Smoke test template post-deploy. Owner: C. Due: May 5.
  4. Currency handling guide в dbt docs. Owner: D. Due: May 3.

Lessons:

  • Metric definition changes требуют extra care
  • Multiple currencies — always check handling
  • Anomaly alerts — not optional на critical metrics

Templates

Many online examples. Adapt для your team.

Google Site Reliability Engineering Book — famous reference.

На собесе

«Написали ли post-mortem?»

Share пример:

  • What incident
  • Your role
  • What learned
  • Process improvement

Shows maturity.

«Blameless culture?»

Explain why важно.

Связанные темы

FAQ

Всегда ли писать?

Depends на severity. High-impact — yes. Low — quick note.

Share широко или узко?

Часто share с team + affected stakeholders. Exec summary к exec.

Anonymous?

Usually не anonymous, но blameless. Name owners factually, не blame.


Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.