23 апреля 2026 г.·5 мин чтения

Как писать postmortem в аналитике

Проверь себя · 1/3разбор после ответа

Есть таблицы payments(user_id) и refunds(user_id). Нужно получить пользователей, у которых был платёж, но не было ни одного возврата. Какой запрос корректнее всего описывает задачу?

Зачем это знать

Dashboard сломался. Metric неверно отчитался. A/B неверно интерпретирован. Иncidents в analytics — common.

Post-mortem — документ, анализирующий what happened → preventing recurrence.

Senior-culture. Часто expected от senior-analysts.

Что такое post-mortem

Docum после incident:

What happened
Timeline
Root cause
Impact
Preventive actions

Engineering practice, adopted by analytics teams.

Когда писать

Big incident

Wrong number в exec report
Dashboard показывал garbage для week
Critical A/B conclusion wrong
Data pipeline broken weeks

Medium

Metric definition change не communicated
Wrong insight shipped based on bad SQL
Important analysis took 3x expected

Small

Rarely formal post-mortem. Quick Slack update.

Структура

1. Title

Clear, factual: «2026-04-22: Revenue dashboard showed 2× actual для CEO report».

2. Summary

1-2 sentences.

«Revenue dashboard displayed doubled values from April 15-22 из-за typo in dbt model. Fixed April 22. CEO report communicated wrong trajectory».

3. Impact

Quantify если возможно:

Who affected
Decisions made based
Business impact estimated
Duration

4. Timeline

Chronological facts:

Time	Event
Apr 15, 09:00	dbt deploy introduces typo
Apr 15, 10:30	Dashboard refresh, new (wrong) values
Apr 18, 14:00	CEO questions «great numbers»
Apr 22, 09:00	Analyst notices inconsistency
Apr 22, 10:00	Investigation begins
Apr 22, 14:00	Root cause identified
Apr 22, 15:00	Fix deployed
Apr 22, 16:00	Correct values confirmed

5. Root cause

Why happened. Use 5 Whys.

«dbt model introduced x * 2 instead of x в revenue calc. QA pre-deploy не catch because model не had output verification test».

6. What went well

Positive things:

Caught within week
Fix deployed fast
No permanent data damage

7. What went poorly

Not caught pre-deploy
Took 7 days to notice
Impacted exec decision

8. Action items

Preventive, specific:

Add test для dbt model output sanity check. Owner: X. Due: Y.
Setup anomaly alert for significant metric jumps. Owner: Y. Due: Z.
Review process deploy changes к metric models. Action: Z. Due: W.

Each action — owner + deadline.

9. Lessons learned

General takeaways для team.

Blameless

Critical principle. Постmortem — learning, не blame.

Don't

«Analyst X deployed broken code».

Do

«Model deployed without output validation. Gap в process».

Focus process, not people.

Processes vs people

People are fallible. Process should catch mistakes.

If process failed → process fix.

Если individual negligent — separate HR, не post-mortem.

Tools

Template

Use consistent template across incidents. Easy scan.

Storage

Notion / Confluence wiki
Accessible
Searchable

Regular review

Monthly: review post-mortems. Team learning.

Готовься к собесу аналитика как в Duolingo

10 минут в день — SQL, Python, A/B, метрики. 1700+ вопросов в Telegram

Открыть Карьерник в Telegram

Internal

Team meeting
Post в analytics channel
Linked в team wiki

Cross-functional

Affects multiple teams → present к leadership. Transparency.

Execution of action items

Post-mortem только useful если actions executed.

Track в ticketing system (Jira)
Weekly review progress
Follow-through обязателен

Without — same incident recurs.

Culture

Safe environment

People shouldn't fear write post-mortem after mistake.

Emphasized — learning opportunity.

Celebrate

Good post-mortems — public recognition.

Read собes' others

Team-wide learning из post-mortems.

Пример (фиктивный)

2026-04-22: Quarterly revenue dashboard showed -15% instead of +8%

Summary: Stakeholders shocked by apparent -15% revenue drop в Q1 dashboard. Real number +8%. SQL error.

Impact:

CEO asked emergency meeting
CFO reviewed forecasts
Took 4 hours staff time
Dashboard used for week before fix

Timeline:

Apr 1: deploy new revenue model
Apr 2: dashboard refresh
Apr 22: stakeholder flags question
Apr 22 14:00: investigation
Apr 22 15:00: root cause
Apr 22 15:30: fix deployed

Root cause: New dbt model used amount_usd instead amount_local. USD dropped 20% vs local в Q1 → apparent revenue drop.

Actual reason: currency не translated уstream.

What went well:

Stakeholder courage к question data
Quick investigation и fix

What went poorly:

3-week delay от deploy к detection
No anomaly alert
PR review missed currency handling
No smoke test post-deploy

Action items:

Add dbt test «revenue total within 10% previous period». Owner: A. Due: Apr 29.
Anomaly alert для revenue metric. Owner: B. Due: Apr 30.
Smoke test template post-deploy. Owner: C. Due: May 5.
Currency handling guide в dbt docs. Owner: D. Due: May 3.

Lessons:

Metric definition changes требуют extra care
Multiple currencies — always check handling
Anomaly alerts — not optional на critical metrics

Templates

Many online examples. Adapt для your team.

Google Site Reliability Engineering Book — famous reference.

На собесе

«Написали ли post-mortem?»

Share пример:

What incident
Your role
What learned
Process improvement

Shows maturity.

«Blameless culture?»

Explain why важно.

Связанные темы

FAQ

Всегда ли писать?

Depends на severity. High-impact — yes. Low — quick note.

Часто share с team + affected stakeholders. Exec summary к exec.

Anonymous?

Usually не anonymous, но blameless. Name owners factually, не blame.