Data ethics для аналитика
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Data — powerful. Misuse — real harm. Analytics affects people's lives.
Ethical analyst questions impact work. На senior собесах topic может появиться.
Ключевые concerns
1. Privacy
Users' data handled carefully.
2. Bias
Analysis biased → biased decisions.
3. Fairness
Model disparate impact между groups?
4. Consent
User agreed к use data?
5. Surveillance
Excessive tracking — invasive.
6. Manipulation
Optimize engagement at user expense?
Privacy
PII (personally identifiable info)
- Names, emails, phones
- Addresses
- IDs (paspport, СНИЛС)
- Health records
- Financial
Sensitive. Protected laws (152-ФЗ в РФ, GDPR в EU).
Anonymization
Remove identifying info. But:
- Inference re-identification possible (age + zip + gender)
- k-anonymity, l-diversity techniques
Access control
Limit. Data analysts need sanitized data.
Retention
Don't store indefinitely. Delete obsolete.
In practice
- Don't download PII locally
- Query через sanitized views
- Security training
Bias
Data bias
Training data reflects historical bias.
Example: hiring model — historical hires predominantly one gender. Model learns prefer that gender.
Algorithmic bias
Algorithm introduces disparate outcomes.
Reporting bias
Show outcomes one group favorably.
Fix
- Diverse data sources
- Balanced training
- Regular fairness audits
- Explainability (SHAP)
Fairness
Definitions
Multiple, often conflicting:
- Equal opportunity: same TPR across groups
- Demographic parity: same positive rate
- Equalized odds: same TPR и FPR
Trade-offs. Can't satisfy all.
Choose based on context.
Example
Credit scoring model:
- True positive rate same для demographics? (EO)
- Same approval rate? (DP)
Different choices — different «fairness».
Audit
Check:
- Model performance per group
- False positive / negative rates
- Disparate impact ratio
Consent
Informed
User знает data used?
Long T&C — technically consent, ethically questionable.
Meaningful
Clear opt-in vs pre-checked.
Withdrawal
User can opt-out / delete later?
Transparency
Users
Should know:
- Data collected
- Purpose
- Shared
- Retention
Internal
Document:
- Data sources
- Processing
- Models
- Decisions
Manipulation
Dark patterns
UX tricks against user:
- Hidden unsubscribe
- Confirmshaming («No, I don't want savings»)
- Hard-to-cancel subscriptions
Ethics fail.
Addictive design
Endless scroll, dopamine hooks — debate.
Tension business goals vs user wellbeing.
Pricing discrimination
Different prices different users.
Legal в некоторых contexts, ethical questions.
Misinformation
Presenting data
- Misleading charts
- Cherry-picked metrics
- Wrong context
Accidental или intentional. Both dangerous.
Best practice
- Full context
- Limitations acknowledged
- Uncertainty quantified
- Alternative interpretations
Ethical decision-making
Pause
Before analysis: «Any concerns?»
Question
«Would I want this done to me?»
«What could go wrong?»
Stakeholders
Different affected parties. Consider.
Document
Rationale decisions. Future reference / accountability.
Cases
Recommendation systems
Filter bubbles? Polarization?
Measure diversity не just engagement.
Pricing
A/B test price change. Disparate impact analysis.
Vulnerable users (first-time vs savvy).
Credit scoring
Model features don't correlate с protected attributes inadvertently?
Legal в finance (сloser regulation).
Hiring ML
High-risk. Careful validation, auditing.
Surveillance
Tracking employees, users extensively. Invasive?
Frameworks
«Do no harm»
Baseline.
«Least invasive»
Minimum data needed.
«User benefit»
Data use benefits user, not just company.
«Transparency»
Openness about practices.
Laws
Russia: 152-ФЗ
Personal data protection.
- Consent required
- Processing goals clear
- Data localized в Russia
- Subject rights (access, delete)
GDPR (EU)
Similar principles. Broader rights.
CCPA (California)
Similar.
Industry-specific
Banking, healthcare — stricter.
Professional responsibility
Speak up
Analysis feels wrong → voice concerns.
«Not comfortable с this. Let's discuss».
Alternatives
Propose ethical path.
Document
If serious ethical issue — paper trail.
Boundaries
Extreme case — refuse work, change job.
На собесе
Ethics questions emerging
«Designing feature. Ethical concerns?»
Structure:
- Identify stakeholders affected
- Privacy implications
- Fairness across groups
- Consent / transparency
- Propose mitigations
Example
«Build fraud model. Ethical?»
Issues:
- False positives (innocents flagged)
- Bias (certain demographics flagged more)
- Opaque decisions
- Appeal process
Mitigations:
- Monitor FPR per group
- Human review edge cases
- Explainability
- Appeal workflow
Shows maturity.
Resources
Books
- «Weapons of Math Destruction» — Cathy O'Neil
- «Algorithms of Oppression» — Safiya Noble
- «Data Feminism» — D'Ignazio, Klein
Courses
- Data Science Ethics (Coursera)
- Fairness ML (Google)
Organizations
- Partnership on AI
- Distributed AI Research Institute
Для аналитика
Daily
- Question extreme outputs
- Check disparate impact
- Respect privacy
- Honest reporting
Escalate
Ethical red flags → manager / legal / ethics board.
Continuous
Field evolves. Read, discuss.
Связанные темы
FAQ
Separate ethics training?
Часто нет, should be.
Analyst ответ?
Co-responsibility. Principal analyst — final decisions.
Report problems?
Yes. Anonymous channels где exist.
Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.