Как писать postmortem в аналитике
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Dashboard сломался. Metric неверно отчитался. A/B неверно интерпретирован. Иncidents в analytics — common.
Post-mortem — документ, анализирующий what happened → preventing recurrence.
Senior-culture. Часто expected от senior-analysts.
Что такое post-mortem
Docum после incident:
- What happened
- Timeline
- Root cause
- Impact
- Preventive actions
Engineering practice, adopted by analytics teams.
Когда писать
Big incident
- Wrong number в exec report
- Dashboard показывал garbage для week
- Critical A/B conclusion wrong
- Data pipeline broken weeks
Medium
- Metric definition change не communicated
- Wrong insight shipped based on bad SQL
- Important analysis took 3x expected
Small
Rarely formal post-mortem. Quick Slack update.
Структура
1. Title
Clear, factual: «2026-04-22: Revenue dashboard showed 2× actual для CEO report».
2. Summary
1-2 sentences.
«Revenue dashboard displayed doubled values from April 15-22 из-за typo in dbt model. Fixed April 22. CEO report communicated wrong trajectory».
3. Impact
Quantify если возможно:
- Who affected
- Decisions made based
- Business impact estimated
- Duration
4. Timeline
Chronological facts:
| Time | Event |
|---|---|
| Apr 15, 09:00 | dbt deploy introduces typo |
| Apr 15, 10:30 | Dashboard refresh, new (wrong) values |
| Apr 18, 14:00 | CEO questions «great numbers» |
| Apr 22, 09:00 | Analyst notices inconsistency |
| Apr 22, 10:00 | Investigation begins |
| Apr 22, 14:00 | Root cause identified |
| Apr 22, 15:00 | Fix deployed |
| Apr 22, 16:00 | Correct values confirmed |
5. Root cause
Why happened. Use 5 Whys.
«dbt model introduced x * 2 instead of x в revenue calc. QA pre-deploy не catch because model не had output verification test».
6. What went well
Positive things:
- Caught within week
- Fix deployed fast
- No permanent data damage
7. What went poorly
- Not caught pre-deploy
- Took 7 days to notice
- Impacted exec decision
8. Action items
Preventive, specific:
- Add test для dbt model output sanity check. Owner: X. Due: Y.
- Setup anomaly alert for significant metric jumps. Owner: Y. Due: Z.
- Review process deploy changes к metric models. Action: Z. Due: W.
Each action — owner + deadline.
9. Lessons learned
General takeaways для team.
Blameless
Critical principle. Постmortem — learning, не blame.
Don't
«Analyst X deployed broken code».
Do
«Model deployed without output validation. Gap в process».
Focus process, not people.
Processes vs people
People are fallible. Process should catch mistakes.
If process failed → process fix.
Если individual negligent — separate HR, не post-mortem.
Tools
Template
Use consistent template across incidents. Easy scan.
Storage
- Notion / Confluence wiki
- Accessible
- Searchable
Regular review
Monthly: review post-mortems. Team learning.
Sharing
Internal
- Team meeting
- Post в analytics channel
- Linked в team wiki
Cross-functional
Affects multiple teams → present к leadership. Transparency.
Execution of action items
Post-mortem только useful если actions executed.
- Track в ticketing system (Jira)
- Weekly review progress
- Follow-through обязателен
Without — same incident recurs.
Culture
Safe environment
People shouldn't fear write post-mortem after mistake.
Emphasized — learning opportunity.
Celebrate
Good post-mortems — public recognition.
Read собes' others
Team-wide learning из post-mortems.
Пример (фиктивный)
2026-04-22: Quarterly revenue dashboard showed -15% instead of +8%
Summary: Stakeholders shocked by apparent -15% revenue drop в Q1 dashboard. Real number +8%. SQL error.
Impact:
- CEO asked emergency meeting
- CFO reviewed forecasts
- Took 4 hours staff time
- Dashboard used for week before fix
Timeline:
- Apr 1: deploy new revenue model
- Apr 2: dashboard refresh
- Apr 22: stakeholder flags question
- Apr 22 14:00: investigation
- Apr 22 15:00: root cause
- Apr 22 15:30: fix deployed
Root cause: New dbt model used amount_usd instead amount_local. USD dropped 20% vs local в Q1 → apparent revenue drop.
Actual reason: currency не translated уstream.
What went well:
- Stakeholder courage к question data
- Quick investigation и fix
What went poorly:
- 3-week delay от deploy к detection
- No anomaly alert
- PR review missed currency handling
- No smoke test post-deploy
Action items:
- Add dbt test «revenue total within 10% previous period». Owner: A. Due: Apr 29.
- Anomaly alert для revenue metric. Owner: B. Due: Apr 30.
- Smoke test template post-deploy. Owner: C. Due: May 5.
- Currency handling guide в dbt docs. Owner: D. Due: May 3.
Lessons:
- Metric definition changes требуют extra care
- Multiple currencies — always check handling
- Anomaly alerts — not optional на critical metrics
Templates
Many online examples. Adapt для your team.
Google Site Reliability Engineering Book — famous reference.
На собесе
«Написали ли post-mortem?»
Share пример:
- What incident
- Your role
- What learned
- Process improvement
Shows maturity.
«Blameless culture?»
Explain why важно.
Связанные темы
- Как провести root cause analysis
- Как найти причину падения метрики
- Data governance
- Как документировать аналитические задачи
FAQ
Всегда ли писать?
Depends на severity. High-impact — yes. Low — quick note.
Share широко или узко?
Часто share с team + affected stakeholders. Exec summary к exec.
Anonymous?
Usually не anonymous, но blameless. Name owners factually, не blame.
Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.