23 апреля 2026 г.·3 мин чтения

Anomaly detection для аналитика

Q: ML или statistical?

Start simple (Z, rolling). ML — если statistical не хватает.

Q: Real-time?

Streaming anomaly detection — возможно, но сложно. Obychno batch (1-hour).

Q: False positives ok?

До определенной degree. 20% FPR too high, 5% acceptable.

Проверь себя · 1/3разбор после ответа

Нужно получить 5 самых дешёвых товаров категории 'electronics' из таблицы products. Какой запрос верный?

Зачем это знать

Если команда узнаёт о падении метрики через 3 дня — потерян revenue. Автоматическое anomaly detection ловит drops в реальном времени. На senior собесах часто: «как бы построили alert system?».

В больших компаниях (Netflix, Uber) целые команды работают над anomaly detection. Для middle-аналитика — знать basics как минимум.

Короткое объяснение

Anomaly detection — identification точек, значительно отклоняющихся от expected pattern.

Aim: catch drops, spikes, or unusual behavior до того, как поздно.

Методы

1. Statistical (simple)

Z-score

z = (x - mean) / std

Если |z| > 3 → anomaly.

Простой, быстрый. Работает на normal data.

IQR

Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q1
Outlier: x < Q1 - 1.5×IQR or x > Q3 + 1.5×IQR

Robust к outliers. Box plot based.

2. Rolling window

expected = rolling_mean(metric, window=7)
threshold = 3 × rolling_std
anomaly = |metric - expected| > threshold

Adapts к trends. Popular в monitoring.

3. Seasonal decomposition

from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(data, model='additive', period=7)
residuals = result.resid
# anomaly на high |residual|

Decompose: trend + seasonal + residual. Anomaly в residual.

4. Prophet (Facebook)

Pre-built time series model с anomaly detection.

from prophet import Prophet
model = Prophet(interval_width=0.99)
model.fit(df)
forecast = model.predict(future)
# Points outside [yhat_lower, yhat_upper] — anomalies

5. ML-based

Isolation Forest

from sklearn.ensemble import IsolationForest
model = IsolationForest(contamination=0.01)
model.fit(X)
anomalies = model.predict(X) == -1

Works on multidimensional data.

Autoencoders

Neural network reconstructs data. High reconstruction error → anomaly.

Для complex patterns, time series.

Проблема seasonality

Daily data:

Thursday revenue < Monday → not anomaly (seasonal)
Christmas ↓ → expected

Ignoring seasonality → false alerts.

Fix: seasonal decomposition или models accounting seasonality.

Alert fatigue

Если alerts слишком часто → ignored.

False positives: legit seasonality flagged as anomaly
Tuning: thresholds per metric, per segment

Goal: 80% alerts actionable.

Системный дизайн

1. Choose metrics

Critical metrics: revenue, DAU, error rate.

Secondary: less critical, don't alert.

2. Choose baseline

Rolling window + seasonality. Или Prophet / custom.

3. Define threshold

Strict: 3σ → fewer alerts, miss subtle
Loose: 2σ → more alerts, more noise

4. Routing

Alert → Slack / PagerDuty / email. На-кого?

5. Actions

Critical: page someone
Warning: log, dashboard
Informational: email digest

Готовься к собесу аналитика как в Duolingo

10 минут в день — SQL, Python, A/B, метрики. 1700+ вопросов в Telegram

Открыть Карьерник в Telegram

Segments

Не только aggregate. Per segment:

Platform
Country
User type

Drop на iOS only — aggregate не покажет.

Tools

Commercial

Anodot — enterprise anomaly platform
DataDog — monitoring + anomalies
New Relic — similar

Open-source

Grafana + alerts
Prometheus + alertmanager
Airflow + custom checks

Custom

Python scripts + cron. Для small-scale.

SQL anomaly detection

WITH stats AS (
    SELECT
        day,
        metric,
        AVG(metric) OVER (ORDER BY day ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS ma7,
        STDDEV(metric) OVER (ORDER BY day ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS std7
    FROM daily
)
SELECT * FROM stats
WHERE ABS(metric - ma7) > 3 * std7;

Simple rolling-z anomaly detection в SQL.

На собесе

«Как бы построили anomaly detection?»

Identify critical metrics
Choose baseline (rolling / seasonal)
Set thresholds (3σ default)
Alert routing + actions
Monitor false positive rate, tune

«Seasonality как handle?»

Seasonal decomposition или Prophet. Не ignore.

«Alert fatigue?»

Balance strict/loose thresholds. Segmentation. Regular tuning.

Частые ошибки

One threshold для всех

Разные metrics — разные baselines, variances.

Ignore seasonality

False positives по weekends, holidays.

No ops plan

Alert fires → что делать? Document runbooks.

Only aggregate

Drop в segment masked в overall.

Связанные темы

FAQ

ML или statistical?