Как избежать bias в анализе

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Аналитик может produce technically correct analysis, но biased interpretation → wrong business decisions. Biases — everywhere: в data collection, в analysis, в presentation.

Senior-аналитик умеет flag и avoid biases. Это отделяет его от «SQL-monkey».

Types of bias

1. Confirmation bias

Ищете data, которая confirms hypothesis. Ignore contradicting.

Fix: actively search for counter-evidence.

2. Survivorship bias

Analyze только «survivors». Miss lost data.

Example: WWII plane armor story. Analyzing return planes, не shot down.

Fix: think carefully, who's в sample. Missing?

3. Selection bias

Sample не represent population.

Example: survey completed users → miss churned insights.

Fix: random sampling, weight.

4. Sampling bias

Specific sub-group over-represented.

Fix: stratified sampling, verify representation.

5. Recall bias

Self-reported data unreliable (users forget / distort).

Fix: behavioral data > self-report.

6. Observer / Hawthorne

Users know observed → change behavior.

Fix: natural observation, blinded studies.

7. Anchoring

First number seen anchors estimates.

Example: initial price $1000, product marked down to $500 → seems great. But maybe overvalued anyway.

Fix: independent assessment before seeing others.

8. Availability heuristic

Easily recalled events seem more frequent.

Example: «много churn recently» — но maybe just recent = top-of-mind.

Fix: always check data, не intuition.

9. Publication / success bias

Published studies tend к have positive results.

Fix: pre-register analyses, publish null results.

10. Base rate fallacy

Ignore background frequency.

Example: disease test 99% accurate + rare disease (1%). Positive test only 17% probable disease. See Bayes простыми словами.

Biases в A/B

Peeking

Multiple checks → inflate FPR. Peeking problem.

HARKing

Hypothesis After Results Known. Post-hoc story на random data.

Fix: pre-register hypotheses.

Cherry-picking

Pick results, fitting narrative.

Fix: pre-defined primary metrics.

Biases в data

Measurement

Instrument (bug, wrong event firing) → systematic error.

Fix: validate events, sanity checks.

Missing data

Не random missing → biased results.

Fix: understand why missing, impute thoughtfully.

Survival через product evolution

Old features removed → historical cohorts analysis biased.

Fix: cohort analysis carefully, document changes.

Cognitive biases в team

Groupthink

Team agrees too quickly → miss alternatives.

Fix: devil's advocate, outside reviewer.

Sunk cost

«We invested much → must continue».

Fix: evaluate on forward expected value.

Status quo

«Current way is working» — but maybe better exists.

Fix: periodically challenge assumptions.

Authority

«Data scientist lead said X». May overvalue.

Fix: question everything, evidence-based.

Как защититься

1. Pre-register

Hypotheses, analyses, metrics — до смотреть data.

2. Multiple hypotheses

Generate 5-10 possible explanations. Test each.

3. Devil's advocate

Someone specifically to critique. Formalize.

4. Look for missing

«Who's не in this data?» Selection бias check.

5. Sanity checks

Always: totals add, percentages sum к 100, numbers reasonable.

6. Peer review

Someone independent reviews before communicate.

7. Null hypothesis serious

Default: «no effect». Only strong evidence reject.

Communicating bias

В presentation

«Limitations:» — formally list biases, data quality issues.

Honesty — builds trust.

Uncertainty

«Effect estimated +5%, CI [2%, 8%]» — показывает не false precision.

Multiple scenarios

Best case / worst case / base case — help stakeholders think.

Пример: analysis

Q: «Продажи wireless headphones растут с 2022. Ship more wireless?»

Possible biases

  • Selection: only people who have Bluetooth phones buy wireless. Stuck in old phones — скипнули
  • Survivorship: только listings left on shelves analyzed, failed SKUs not
  • Sample: last 6 months — but market changing?
  • Availability: recent trend seems persistent, maybe not

Better analysis

  • Cohort analysis by phone type
  • Include discontinued products
  • Longer time window
  • Macro trend validation

На собесе

«Какие biases в аналитике?»

List 3-5, с examples.

«Как protect от confirmation bias?»

  • Generate multiple hypotheses
  • Specifically look for disconfirming evidence
  • Peer review

«Survivorship пример?»

WWII plane armor. Or: churned users not surveyed.

Частые ошибки

Assume objectivity

«Я data-driven, нет bias». Everyone has biases.

Dismiss qualitative

Only numbers matter? User interviews catch things.

Over-confidence

Rare to be 100% sure. Communicate uncertainty.

Связанные темы

FAQ

Bias совсем убрать возможно?

No. Minimize и awareness.

Какой most dangerous?

Depends on domain. Confirmation — universal. Others — context.

Как train?

Read Kahneman's «Thinking Fast and Slow». Practice.


Тренируйте критическое мышление — откройте тренажёр с 1500+ вопросами для собесов.