Как посчитать SRM в SQL
Содержание:
Зачем SRM
SRM (Sample Ratio Mismatch) — A и B вариант имеют sample отличающийся от expected ratio (e.g. 50/50). Если есть SRM — test broken, results unreliable. SRM check — обязательная гvarная gate.
Формула
Chi-square test:
χ² = Σ ((observed - expected)² / expected)Если p-value < 0.001 (или 0.005) → SRM detected.
Базовый расчёт
WITH observed AS (
SELECT
variant,
COUNT(DISTINCT user_id) AS n
FROM ab_test_assignments
WHERE test_id = 123
GROUP BY variant
),
total AS (
SELECT SUM(n) AS total_n FROM observed
)
SELECT
o.variant,
o.n AS observed,
t.total_n * 0.5 AS expected, -- 50/50 split
POW(o.n - t.total_n * 0.5, 2) / (t.total_n * 0.5) AS chi_sq_contribution
FROM observed o, total t
ORDER BY o.variant;Sum the chi_sq_contributions. Compare to chi-square critical value (df=1 при 2 variants): 10.83 = p=0.001.
Когда SRM critical
При 50/50:
- N=10,000: 5050/4950 — нормально (chi² = 1.0, p ≈ 0.32)
- N=10,000: 5300/4700 — SRM (chi² = 36, p < 0.0001)
«Допустимый» drift зависит от sample. С большим sample даже 1% diff = SRM.
Причины SRM
- Бот трафик не одинаково попадает в variants
- Bug в bucket logic — assignment is biased
- Tracking gap — events дропаются у одной variant
- Caching — старые users в legacy bucket
- Crash — variant B crashes → fewer users tracked
-- Drill: SRM by date
SELECT
DATE_TRUNC('day', assigned_at) AS day,
variant,
COUNT(*) AS users,
COUNT(*) FILTER (WHERE variant = 'A')::NUMERIC * 100 / COUNT(*) OVER (PARTITION BY DATE_TRUNC('day', assigned_at)) AS pct_a
FROM ab_test_assignments
WHERE test_id = 123
GROUP BY DATE_TRUNC('day', assigned_at), variant
ORDER BY day, variant;Если SRM появилась после Day X — investigate that date's deploy.
SRM по segments
SRM overall ok, но в conditional bucket — broken:
SELECT
u.country,
COUNT(*) FILTER (WHERE a.variant = 'A') AS a_users,
COUNT(*) FILTER (WHERE a.variant = 'B') AS b_users,
COUNT(*) FILTER (WHERE a.variant = 'A')::NUMERIC * 100 / COUNT(*) AS pct_a
FROM ab_test_assignments a
JOIN users u ON u.user_id = a.user_id
WHERE a.test_id = 123
GROUP BY u.country
HAVING ABS(COUNT(*) FILTER (WHERE a.variant = 'A')::NUMERIC * 100 / COUNT(*) - 50) > 3
ORDER BY 4;Частые ошибки
Ошибка 1. Ignoring SRM. SRM = data quality issue. Don't trust significant results.
Ошибка 2. Threshold too loose. α = 0.05 для SRM → many false alarms. Use 0.001 / 0.005.
Ошибка 3. SRM only at start. Recheck SRM periodically. Drift comes mid-test.
Ошибка 4. SRM by segment ignored. Overall ok, но certain segment skewed. Re-stratify.
Ошибка 5. SRM в triggered experiments. Trigger logic biased per variant. SRM в pre-exposure pool.
Связанные темы
- Как посчитать p-value в SQL
- Как посчитать sample size в SQL
- Как посчитать MDE в SQL
- Как посчитать statistical power в SQL
FAQ
Какой SRM threshold?
p < 0.001 / 0.005. Strict because critical guardrail.
SRM есть — что делать?
Pause test, debug assignment / tracking. Re-launch with fix.
SRM один раз — relaunch?
Yes. Don't trust biased data. Cost of fix < cost of wrong decision.
SRM в triggered exp?
Pre-exposure SRM — check trigger logic. Sometimes triggering biased.
Auto-SRM tools?
Statsig, Optimizely, Eppo все встроили. Mandatory feature 2024+.