Anomaly detection для аналитика

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Если команда узнаёт о падении метрики через 3 дня — потерян revenue. Автоматическое anomaly detection ловит drops в реальном времени. На senior собесах часто: «как бы построили alert system?».

В больших компаниях (Netflix, Uber) целые команды работают над anomaly detection. Для middle-аналитика — знать basics как минимум.

Короткое объяснение

Anomaly detection — identification точек, значительно отклоняющихся от expected pattern.

Aim: catch drops, spikes, or unusual behavior до того, как поздно.

Методы

1. Statistical (simple)

Z-score

z = (x - mean) / std

Если |z| > 3 → anomaly.

Простой, быстрый. Работает на normal data.

IQR

Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q1
Outlier: x < Q1 - 1.5×IQR or x > Q3 + 1.5×IQR

Robust к outliers. Box plot based.

2. Rolling window

expected = rolling_mean(metric, window=7)
threshold = 3 × rolling_std
anomaly = |metric - expected| > threshold

Adapts к trends. Popular в monitoring.

3. Seasonal decomposition

from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(data, model='additive', period=7)
residuals = result.resid
# anomaly на high |residual|

Decompose: trend + seasonal + residual. Anomaly в residual.

4. Prophet (Facebook)

Pre-built time series model с anomaly detection.

from prophet import Prophet
model = Prophet(interval_width=0.99)
model.fit(df)
forecast = model.predict(future)
# Points outside [yhat_lower, yhat_upper] — anomalies

5. ML-based

Isolation Forest

from sklearn.ensemble import IsolationForest
model = IsolationForest(contamination=0.01)
model.fit(X)
anomalies = model.predict(X) == -1

Works on multidimensional data.

Autoencoders

Neural network reconstructs data. High reconstruction error → anomaly.

Для complex patterns, time series.

Проблема seasonality

Daily data:

  • Thursday revenue < Monday → not anomaly (seasonal)
  • Christmas ↓ → expected

Ignoring seasonality → false alerts.

Fix: seasonal decomposition или models accounting seasonality.

Alert fatigue

Если alerts слишком часто → ignored.

  • False positives: legit seasonality flagged as anomaly
  • Tuning: thresholds per metric, per segment

Goal: 80% alerts actionable.

Системный дизайн

1. Choose metrics

Critical metrics: revenue, DAU, error rate.

Secondary: less critical, don't alert.

2. Choose baseline

Rolling window + seasonality. Или Prophet / custom.

3. Define threshold

  • Strict: 3σ → fewer alerts, miss subtle
  • Loose: 2σ → more alerts, more noise

4. Routing

Alert → Slack / PagerDuty / email. На-кого?

5. Actions

  • Critical: page someone
  • Warning: log, dashboard
  • Informational: email digest

Segments

Не только aggregate. Per segment:

  • Platform
  • Country
  • User type

Drop на iOS only — aggregate не покажет.

Tools

Commercial

  • Anodot — enterprise anomaly platform
  • DataDog — monitoring + anomalies
  • New Relic — similar

Open-source

  • Grafana + alerts
  • Prometheus + alertmanager
  • Airflow + custom checks

Custom

Python scripts + cron. Для small-scale.

SQL anomaly detection

WITH stats AS (
    SELECT
        day,
        metric,
        AVG(metric) OVER (ORDER BY day ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS ma7,
        STDDEV(metric) OVER (ORDER BY day ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS std7
    FROM daily
)
SELECT * FROM stats
WHERE ABS(metric - ma7) > 3 * std7;

Simple rolling-z anomaly detection в SQL.

На собесе

«Как бы построили anomaly detection?»

  1. Identify critical metrics
  2. Choose baseline (rolling / seasonal)
  3. Set thresholds (3σ default)
  4. Alert routing + actions
  5. Monitor false positive rate, tune

«Seasonality как handle?»

Seasonal decomposition или Prophet. Не ignore.

«Alert fatigue?»

Balance strict/loose thresholds. Segmentation. Regular tuning.

Частые ошибки

One threshold для всех

Разные metrics — разные baselines, variances.

Ignore seasonality

False positives по weekends, holidays.

No ops plan

Alert fires → что делать? Document runbooks.

Only aggregate

Drop в segment masked в overall.

Связанные темы

FAQ

ML или statistical?

Start simple (Z, rolling). ML — если statistical не хватает.

Real-time?

Streaming anomaly detection — возможно, но сложно. Obychno batch (1-hour).

False positives ok?

До определенной degree. 20% FPR too high, 5% acceptable.


Тренируйте аналитику — откройте тренажёр с 1500+ вопросами для собесов.