Как избежать bias в анализе
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Аналитик может produce technically correct analysis, но biased interpretation → wrong business decisions. Biases — everywhere: в data collection, в analysis, в presentation.
Senior-аналитик умеет flag и avoid biases. Это отделяет его от «SQL-monkey».
Types of bias
1. Confirmation bias
Ищете data, которая confirms hypothesis. Ignore contradicting.
Fix: actively search for counter-evidence.
2. Survivorship bias
Analyze только «survivors». Miss lost data.
Example: WWII plane armor story. Analyzing return planes, не shot down.
Fix: think carefully, who's в sample. Missing?
3. Selection bias
Sample не represent population.
Example: survey completed users → miss churned insights.
Fix: random sampling, weight.
4. Sampling bias
Specific sub-group over-represented.
Fix: stratified sampling, verify representation.
5. Recall bias
Self-reported data unreliable (users forget / distort).
Fix: behavioral data > self-report.
6. Observer / Hawthorne
Users know observed → change behavior.
Fix: natural observation, blinded studies.
7. Anchoring
First number seen anchors estimates.
Example: initial price $1000, product marked down to $500 → seems great. But maybe overvalued anyway.
Fix: independent assessment before seeing others.
8. Availability heuristic
Easily recalled events seem more frequent.
Example: «много churn recently» — но maybe just recent = top-of-mind.
Fix: always check data, не intuition.
9. Publication / success bias
Published studies tend к have positive results.
Fix: pre-register analyses, publish null results.
10. Base rate fallacy
Ignore background frequency.
Example: disease test 99% accurate + rare disease (1%). Positive test only 17% probable disease. See Bayes простыми словами.
Biases в A/B
Peeking
Multiple checks → inflate FPR. Peeking problem.
HARKing
Hypothesis After Results Known. Post-hoc story на random data.
Fix: pre-register hypotheses.
Cherry-picking
Pick results, fitting narrative.
Fix: pre-defined primary metrics.
Biases в data
Measurement
Instrument (bug, wrong event firing) → systematic error.
Fix: validate events, sanity checks.
Missing data
Не random missing → biased results.
Fix: understand why missing, impute thoughtfully.
Survival через product evolution
Old features removed → historical cohorts analysis biased.
Fix: cohort analysis carefully, document changes.
Cognitive biases в team
Groupthink
Team agrees too quickly → miss alternatives.
Fix: devil's advocate, outside reviewer.
Sunk cost
«We invested much → must continue».
Fix: evaluate on forward expected value.
Status quo
«Current way is working» — but maybe better exists.
Fix: periodically challenge assumptions.
Authority
«Data scientist lead said X». May overvalue.
Fix: question everything, evidence-based.
Как защититься
1. Pre-register
Hypotheses, analyses, metrics — до смотреть data.
2. Multiple hypotheses
Generate 5-10 possible explanations. Test each.
3. Devil's advocate
Someone specifically to critique. Formalize.
4. Look for missing
«Who's не in this data?» Selection бias check.
5. Sanity checks
Always: totals add, percentages sum к 100, numbers reasonable.
6. Peer review
Someone independent reviews before communicate.
7. Null hypothesis serious
Default: «no effect». Only strong evidence reject.
Communicating bias
В presentation
«Limitations:» — formally list biases, data quality issues.
Honesty — builds trust.
Uncertainty
«Effect estimated +5%, CI [2%, 8%]» — показывает не false precision.
Multiple scenarios
Best case / worst case / base case — help stakeholders think.
Пример: analysis
Q: «Продажи wireless headphones растут с 2022. Ship more wireless?»
Possible biases
- Selection: only people who have Bluetooth phones buy wireless. Stuck in old phones — скипнули
- Survivorship: только listings left on shelves analyzed, failed SKUs not
- Sample: last 6 months — but market changing?
- Availability: recent trend seems persistent, maybe not
Better analysis
- Cohort analysis by phone type
- Include discontinued products
- Longer time window
- Macro trend validation
На собесе
«Какие biases в аналитике?»
List 3-5, с examples.
«Как protect от confirmation bias?»
- Generate multiple hypotheses
- Specifically look for disconfirming evidence
- Peer review
«Survivorship пример?»
WWII plane armor. Or: churned users not surveyed.
Частые ошибки
Assume objectivity
«Я data-driven, нет bias». Everyone has biases.
Dismiss qualitative
Only numbers matter? User interviews catch things.
Over-confidence
Rare to be 100% sure. Communicate uncertainty.
Связанные темы
FAQ
Bias совсем убрать возможно?
No. Minimize и awareness.
Какой most dangerous?
Depends on domain. Confirmation — universal. Others — context.
Как train?
Read Kahneman's «Thinking Fast and Slow». Practice.
Тренируйте критическое мышление — откройте тренажёр с 1500+ вопросами для собесов.