Data ethics для аналитика

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Data — powerful. Misuse — real harm. Analytics affects people's lives.

Ethical analyst questions impact work. На senior собесах topic может появиться.

Ключевые concerns

1. Privacy

Users' data handled carefully.

2. Bias

Analysis biased → biased decisions.

3. Fairness

Model disparate impact между groups?

4. Consent

User agreed к use data?

5. Surveillance

Excessive tracking — invasive.

6. Manipulation

Optimize engagement at user expense?

Privacy

PII (personally identifiable info)

  • Names, emails, phones
  • Addresses
  • IDs (paspport, СНИЛС)
  • Health records
  • Financial

Sensitive. Protected laws (152-ФЗ в РФ, GDPR в EU).

Anonymization

Remove identifying info. But:

  • Inference re-identification possible (age + zip + gender)
  • k-anonymity, l-diversity techniques

Access control

Limit. Data analysts need sanitized data.

Retention

Don't store indefinitely. Delete obsolete.

In practice

  • Don't download PII locally
  • Query через sanitized views
  • Security training

Bias

Data bias

Training data reflects historical bias.

Example: hiring model — historical hires predominantly one gender. Model learns prefer that gender.

Algorithmic bias

Algorithm introduces disparate outcomes.

Reporting bias

Show outcomes one group favorably.

Fix

  • Diverse data sources
  • Balanced training
  • Regular fairness audits
  • Explainability (SHAP)

Fairness

Definitions

Multiple, often conflicting:

  • Equal opportunity: same TPR across groups
  • Demographic parity: same positive rate
  • Equalized odds: same TPR и FPR

Trade-offs. Can't satisfy all.

Choose based on context.

Example

Credit scoring model:

  • True positive rate same для demographics? (EO)
  • Same approval rate? (DP)

Different choices — different «fairness».

Audit

Check:

  • Model performance per group
  • False positive / negative rates
  • Disparate impact ratio

Consent

Informed

User знает data used?

Long T&C — technically consent, ethically questionable.

Meaningful

Clear opt-in vs pre-checked.

Withdrawal

User can opt-out / delete later?

Transparency

Users

Should know:

  • Data collected
  • Purpose
  • Shared
  • Retention

Internal

Document:

  • Data sources
  • Processing
  • Models
  • Decisions

Manipulation

Dark patterns

UX tricks against user:

  • Hidden unsubscribe
  • Confirmshaming («No, I don't want savings»)
  • Hard-to-cancel subscriptions

Ethics fail.

Addictive design

Endless scroll, dopamine hooks — debate.

Tension business goals vs user wellbeing.

Pricing discrimination

Different prices different users.

Legal в некоторых contexts, ethical questions.

Misinformation

Presenting data

  • Misleading charts
  • Cherry-picked metrics
  • Wrong context

Accidental или intentional. Both dangerous.

Best practice

  • Full context
  • Limitations acknowledged
  • Uncertainty quantified
  • Alternative interpretations

Ethical decision-making

Pause

Before analysis: «Any concerns?»

Question

«Would I want this done to me?»

«What could go wrong?»

Stakeholders

Different affected parties. Consider.

Document

Rationale decisions. Future reference / accountability.

Cases

Recommendation systems

Filter bubbles? Polarization?

Measure diversity не just engagement.

Pricing

A/B test price change. Disparate impact analysis.

Vulnerable users (first-time vs savvy).

Credit scoring

Model features don't correlate с protected attributes inadvertently?

Legal в finance (сloser regulation).

Hiring ML

High-risk. Careful validation, auditing.

Surveillance

Tracking employees, users extensively. Invasive?

Frameworks

«Do no harm»

Baseline.

«Least invasive»

Minimum data needed.

«User benefit»

Data use benefits user, not just company.

«Transparency»

Openness about practices.

Laws

Russia: 152-ФЗ

Personal data protection.

  • Consent required
  • Processing goals clear
  • Data localized в Russia
  • Subject rights (access, delete)

GDPR (EU)

Similar principles. Broader rights.

CCPA (California)

Similar.

Industry-specific

Banking, healthcare — stricter.

Professional responsibility

Speak up

Analysis feels wrong → voice concerns.

«Not comfortable с this. Let's discuss».

Alternatives

Propose ethical path.

Document

If serious ethical issue — paper trail.

Boundaries

Extreme case — refuse work, change job.

На собесе

Ethics questions emerging

«Designing feature. Ethical concerns?»

Structure:

  • Identify stakeholders affected
  • Privacy implications
  • Fairness across groups
  • Consent / transparency
  • Propose mitigations

Example

«Build fraud model. Ethical?»

Issues:

  • False positives (innocents flagged)
  • Bias (certain demographics flagged more)
  • Opaque decisions
  • Appeal process

Mitigations:

  • Monitor FPR per group
  • Human review edge cases
  • Explainability
  • Appeal workflow

Shows maturity.

Resources

Books

  • «Weapons of Math Destruction» — Cathy O'Neil
  • «Algorithms of Oppression» — Safiya Noble
  • «Data Feminism» — D'Ignazio, Klein

Courses

  • Data Science Ethics (Coursera)
  • Fairness ML (Google)

Organizations

  • Partnership on AI
  • Distributed AI Research Institute

Для аналитика

Daily

  • Question extreme outputs
  • Check disparate impact
  • Respect privacy
  • Honest reporting

Escalate

Ethical red flags → manager / legal / ethics board.

Continuous

Field evolves. Read, discuss.

Связанные темы

FAQ

Separate ethics training?

Часто нет, should be.

Analyst ответ?

Co-responsibility. Principal analyst — final decisions.

Report problems?

Yes. Anonymous channels где exist.


Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.