SQL для A/B-тестов
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
A/B-тест ran — нужно analyze. Много компаний stats делают в SQL или экспортируют в Python. Умение SQL-анализа A/B — обязательно для middle product analyst.
Basic analysis
Conversion rate per variant
SELECT
variant,
COUNT(*) AS users,
SUM(converted) AS conversions,
AVG(converted) AS cr,
AVG(converted) * 100 AS cr_pct
FROM experiment
GROUP BY variant;Lift calculation
WITH stats AS (
SELECT
AVG(CASE WHEN variant = 'control' THEN converted END) AS control_cr,
AVG(CASE WHEN variant = 'test' THEN converted END) AS test_cr
FROM experiment
)
SELECT
control_cr,
test_cr,
test_cr - control_cr AS abs_diff,
(test_cr - control_cr) / control_cr AS relative_lift,
(test_cr - control_cr) / control_cr * 100 AS lift_pct
FROM stats;Standard error
Standard error разницы:
WITH stats AS (
SELECT
variant,
AVG(converted) AS p,
COUNT(*) AS n
FROM experiment
GROUP BY variant
),
pivoted AS (
SELECT
MAX(CASE WHEN variant = 'control' THEN p END) AS p_c,
MAX(CASE WHEN variant = 'control' THEN n END) AS n_c,
MAX(CASE WHEN variant = 'test' THEN p END) AS p_t,
MAX(CASE WHEN variant = 'test' THEN n END) AS n_t
FROM stats
)
SELECT
p_c, p_t,
p_t - p_c AS diff,
SQRT(p_c * (1 - p_c) / n_c + p_t * (1 - p_t) / n_t) AS se,
(p_t - p_c) / SQRT(p_c * (1 - p_c) / n_c + p_t * (1 - p_t) / n_t) AS z_stat
FROM pivoted;P-value
Для two-tailed:
-- z_stat found выше
-- p-value = 2 × (1 - Φ(|z|))
-- В SQL approximation через Normal CDF
-- В Postgres:
-- Нет стандартной функции NORMAL_CDF, нужно workaround.
-- Часто — export в Python для этого.Simpler approach: check if |z| > 1.96 (for 95% CI):
CASE WHEN ABS(z_stat) > 1.96 THEN 'significant' ELSE 'NOT' END AS resultConfidence interval
SELECT
diff - 1.96 * se AS ci_lower,
diff + 1.96 * se AS ci_upper
FROM pivoted;Если CI не contains 0 → significant.
Continuous metrics
Для revenue (не conversion):
WITH stats AS (
SELECT
variant,
AVG(revenue) AS mean_rev,
STDDEV(revenue) AS std_rev,
COUNT(*) AS n
FROM experiment
GROUP BY variant
)
-- T-test-like через SQL
SELECT
MAX(CASE WHEN variant = 'test' THEN mean_rev END) -
MAX(CASE WHEN variant = 'control' THEN mean_rev END) AS diff,
SQRT(
MAX(CASE WHEN variant = 'test' THEN std_rev * std_rev / n END) +
MAX(CASE WHEN variant = 'control' THEN std_rev * std_rev / n END)
) AS se
FROM stats;Segmentation
CR per segment:
SELECT
variant,
platform,
AVG(converted) AS cr,
COUNT(*) AS n
FROM experiment
GROUP BY variant, platform
ORDER BY platform, variant;Discover differential effects.
Guardrail metrics
Not just primary. Check secondary:
SELECT
variant,
AVG(converted) AS cr,
AVG(revenue) AS arpu,
AVG(refund) AS refund_rate,
AVG(session_count) AS avg_sessions
FROM experiment
GROUP BY variant;Treatment может improve primary, но hurt guardrail (revenue up, but refunds up too).
Time series effect
Per day:
SELECT
DATE(assigned_at) AS day,
variant,
COUNT(*) AS users,
AVG(converted) AS cr
FROM experiment
GROUP BY 1, 2
ORDER BY 1, 2;Plot over time — check для novelty effect.
Multiple testing
10 metrics tested simultaneously → FPR inflates.
Bonferroni: divide α by number tests.
-- If 10 metrics, threshold становится 0.05/10 = 0.005
-- So need |z| > 2.81 вместо 1.96Power check
После тест — чекаем, hathalad sufficient power:
WITH stats AS (
SELECT
variant,
AVG(converted) AS cr,
COUNT(*) AS n
FROM experiment
GROUP BY variant
)
SELECT
MAX(n) AS max_group_size,
-- Approximate MDE detected с current N
1.96 * SQRT(2 * 0.1 * 0.9 / MAX(n)) AS detectable_diff
FROM stats;Sample Ratio Mismatch (SRM)
Assignment 50/50 supposed → but actually?
SELECT
variant,
COUNT(*) AS users,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS pct
FROM experiment
GROUP BY variant;If pct не 50/50 (e.g., 53/47) → SRM issue. Exclude test from analysis.
Chi-square test detect SRM.
CUPED через SQL
Pre-experiment covariate → reduced variance:
WITH theta_calc AS (
SELECT
COVAR_POP(post_metric, pre_metric) / VAR_POP(pre_metric) AS theta,
AVG(pre_metric) AS mean_pre
FROM experiment
)
SELECT
variant,
AVG(post_metric - t.theta * (pre_metric - t.mean_pre)) AS cuped_metric
FROM experiment, theta_calc t
GROUP BY variant;На собесе
«Tест launched. CR control=10%, test=11%. Significant?»
Walk через:
- Compute actual numbers
- Standard error разницы
- Z-statistic
- Compare к threshold
Show knowledge + SQL syntax.
Частые ошибки
Only primary metric
Check guardrails.
Ignore segments
Heterogeneity может hide effects.
No SRM check
Silent issue.
Misinterpret p-value
«p < 0.05 → big effect». No — statistical, не practical.
Связанные темы
- Как посчитать конверсию в SQL
- A/B-тест простыми словами
- p-value простыми словами
- CUPED простыми словами
FAQ
Python обязателен?
Для более precise stats — да. SQL для quick checks.
Sequential в SQL?
Complex. Обычно Python.
Scale analysis?
SQL fine для millions rows. Billions — specialized tools.
Тренируйте A/B — откройте тренажёр с 1500+ вопросами для собесов.