SQL для A/B-тестов

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

A/B-тест ran — нужно analyze. Много компаний stats делают в SQL или экспортируют в Python. Умение SQL-анализа A/B — обязательно для middle product analyst.

Basic analysis

Conversion rate per variant

SELECT
    variant,
    COUNT(*) AS users,
    SUM(converted) AS conversions,
    AVG(converted) AS cr,
    AVG(converted) * 100 AS cr_pct
FROM experiment
GROUP BY variant;

Lift calculation

WITH stats AS (
    SELECT
        AVG(CASE WHEN variant = 'control' THEN converted END) AS control_cr,
        AVG(CASE WHEN variant = 'test' THEN converted END) AS test_cr
    FROM experiment
)
SELECT
    control_cr,
    test_cr,
    test_cr - control_cr AS abs_diff,
    (test_cr - control_cr) / control_cr AS relative_lift,
    (test_cr - control_cr) / control_cr * 100 AS lift_pct
FROM stats;

Standard error

Standard error разницы:

WITH stats AS (
    SELECT
        variant,
        AVG(converted) AS p,
        COUNT(*) AS n
    FROM experiment
    GROUP BY variant
),
pivoted AS (
    SELECT
        MAX(CASE WHEN variant = 'control' THEN p END) AS p_c,
        MAX(CASE WHEN variant = 'control' THEN n END) AS n_c,
        MAX(CASE WHEN variant = 'test' THEN p END) AS p_t,
        MAX(CASE WHEN variant = 'test' THEN n END) AS n_t
    FROM stats
)
SELECT
    p_c, p_t,
    p_t - p_c AS diff,
    SQRT(p_c * (1 - p_c) / n_c + p_t * (1 - p_t) / n_t) AS se,
    (p_t - p_c) / SQRT(p_c * (1 - p_c) / n_c + p_t * (1 - p_t) / n_t) AS z_stat
FROM pivoted;

P-value

Для two-tailed:

-- z_stat found выше
-- p-value = 2 × (1 - Φ(|z|))
-- В SQL approximation через Normal CDF

-- В Postgres:
-- Нет стандартной функции NORMAL_CDF, нужно workaround.
-- Часто — export в Python для этого.

Simpler approach: check if |z| > 1.96 (for 95% CI):

CASE WHEN ABS(z_stat) > 1.96 THEN 'significant' ELSE 'NOT' END AS result

Confidence interval

SELECT
    diff - 1.96 * se AS ci_lower,
    diff + 1.96 * se AS ci_upper
FROM pivoted;

Если CI не contains 0 → significant.

Continuous metrics

Для revenue (не conversion):

WITH stats AS (
    SELECT
        variant,
        AVG(revenue) AS mean_rev,
        STDDEV(revenue) AS std_rev,
        COUNT(*) AS n
    FROM experiment
    GROUP BY variant
)
-- T-test-like через SQL

SELECT
    MAX(CASE WHEN variant = 'test' THEN mean_rev END) -
    MAX(CASE WHEN variant = 'control' THEN mean_rev END) AS diff,
    SQRT(
        MAX(CASE WHEN variant = 'test' THEN std_rev * std_rev / n END) +
        MAX(CASE WHEN variant = 'control' THEN std_rev * std_rev / n END)
    ) AS se
FROM stats;

Segmentation

CR per segment:

SELECT
    variant,
    platform,
    AVG(converted) AS cr,
    COUNT(*) AS n
FROM experiment
GROUP BY variant, platform
ORDER BY platform, variant;

Discover differential effects.

Guardrail metrics

Not just primary. Check secondary:

SELECT
    variant,
    AVG(converted) AS cr,
    AVG(revenue) AS arpu,
    AVG(refund) AS refund_rate,
    AVG(session_count) AS avg_sessions
FROM experiment
GROUP BY variant;

Treatment может improve primary, но hurt guardrail (revenue up, but refunds up too).

Time series effect

Per day:

SELECT
    DATE(assigned_at) AS day,
    variant,
    COUNT(*) AS users,
    AVG(converted) AS cr
FROM experiment
GROUP BY 1, 2
ORDER BY 1, 2;

Plot over time — check для novelty effect.

Multiple testing

10 metrics tested simultaneously → FPR inflates.

Bonferroni: divide α by number tests.

-- If 10 metrics, threshold становится 0.05/10 = 0.005
-- So need |z| > 2.81 вместо 1.96

Power check

После тест — чекаем, hathalad sufficient power:

WITH stats AS (
    SELECT
        variant,
        AVG(converted) AS cr,
        COUNT(*) AS n
    FROM experiment
    GROUP BY variant
)
SELECT
    MAX(n) AS max_group_size,
    -- Approximate MDE detected с current N
    1.96 * SQRT(2 * 0.1 * 0.9 / MAX(n)) AS detectable_diff
FROM stats;

Sample Ratio Mismatch (SRM)

Assignment 50/50 supposed → but actually?

SELECT
    variant,
    COUNT(*) AS users,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS pct
FROM experiment
GROUP BY variant;

If pct не 50/50 (e.g., 53/47) → SRM issue. Exclude test from analysis.

Chi-square test detect SRM.

CUPED через SQL

Pre-experiment covariate → reduced variance:

WITH theta_calc AS (
    SELECT
        COVAR_POP(post_metric, pre_metric) / VAR_POP(pre_metric) AS theta,
        AVG(pre_metric) AS mean_pre
    FROM experiment
)
SELECT
    variant,
    AVG(post_metric - t.theta * (pre_metric - t.mean_pre)) AS cuped_metric
FROM experiment, theta_calc t
GROUP BY variant;

На собесе

«Tест launched. CR control=10%, test=11%. Significant?»

Walk через:

  1. Compute actual numbers
  2. Standard error разницы
  3. Z-statistic
  4. Compare к threshold

Show knowledge + SQL syntax.

Частые ошибки

Only primary metric

Check guardrails.

Ignore segments

Heterogeneity может hide effects.

No SRM check

Silent issue.

Misinterpret p-value

«p < 0.05 → big effect». No — statistical, не practical.

Связанные темы

FAQ

Python обязателен?

Для более precise stats — да. SQL для quick checks.

Sequential в SQL?

Complex. Обычно Python.

Scale analysis?

SQL fine для millions rows. Billions — specialized tools.


Тренируйте A/B — откройте тренажёр с 1500+ вопросами для собесов.