Как посчитать p-value в SQL

Закрепи формулу p value в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать p value в Telegram

Зачем p-value

p-value = вероятность увидеть данный (или больший) effect, если null hypothesis истинна. Низкий p-value → reject H₀. p < 0.05 = «statistically significant» при α = 0.05.

z-test для пропорций

Для конверсий (A vs B):

WITH stats AS (
    SELECT
        COUNT(DISTINCT user_id) FILTER (WHERE variant = 'A') AS n_a,
        COUNT(DISTINCT user_id) FILTER (WHERE variant = 'A' AND converted) AS x_a,
        COUNT(DISTINCT user_id) FILTER (WHERE variant = 'B') AS n_b,
        COUNT(DISTINCT user_id) FILTER (WHERE variant = 'B' AND converted) AS x_b
    FROM ab_test
    WHERE test_id = 123
),
calc AS (
    SELECT
        x_a::NUMERIC / n_a AS p_a,
        x_b::NUMERIC / n_b AS p_b,
        (x_a + x_b)::NUMERIC / (n_a + n_b) AS p_pool,
        n_a, n_b, x_a, x_b
    FROM stats
)
SELECT
    p_a,
    p_b,
    p_b - p_a AS effect_size,
    (p_b - p_a) / SQRT(p_pool * (1 - p_pool) * (1.0/n_a + 1.0/n_b)) AS z_stat,
    -- p-value (two-tailed): 2 * (1 - Φ(|z|))
    -- В PostgreSQL: используем нормальное распределение
    2 * (1 - NORMAL_CDF(ABS((p_b - p_a) / SQRT(p_pool * (1 - p_pool) * (1.0/n_a + 1.0/n_b))))) AS p_value
FROM calc;

Если NORMAL_CDF не доступен, можно через ERF:

Φ(z) = 0.5 × (1 + erf(z / sqrt(2)))

t-test для continuous

Для AOV / revenue (continuous):

WITH a AS (
    SELECT
        AVG(amount) AS mean_a,
        VAR_SAMP(amount) AS var_a,
        COUNT(*) AS n_a
    FROM ab_test_arpu
    WHERE variant = 'A'
),
b AS (
    SELECT
        AVG(amount) AS mean_b,
        VAR_SAMP(amount) AS var_b,
        COUNT(*) AS n_b
    FROM ab_test_arpu
    WHERE variant = 'B'
)
SELECT
    a.mean_a,
    b.mean_b,
    b.mean_b - a.mean_a AS effect,
    -- Welch's t-stat
    (b.mean_b - a.mean_a) / SQRT(a.var_a/a.n_a + b.var_b/b.n_b) AS t_stat
FROM a, b;

p-value через t-distribution — обычно considered в Python (scipy.stats.t.sf).

Двусторонний vs односторонний

-- Two-tailed: are A and B different?
-- p = 2 × (1 - Φ(|z|))

-- One-tailed: is B better than A?
-- p = 1 - Φ(z)
-- (only valid if you committed to direction pre-test)

Most tests two-tailed. One-tailed только если есть pre-registered hypothesis.

Закрепи формулу p value в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать p value в Telegram

Интерпретация

p < 0.05 — significant. p > 0.05 — not significant. Но:

  • p = 0.04 vs p = 0.06 — basically same evidence
  • Effect size ≠ p-value
  • Significant ≠ practically meaningful

Частые ошибки

Ошибка 1. p < 0.05 = «B лучше». p-value не говорит «B лучше». Reject null = different (could be A better если effect negative).

Ошибка 2. p > 0.05 = «no difference». Absence of evidence ≠ evidence of absence. Could be underpowered.

Ошибка 3. Multiple comparisons. 5 metrics @ α=0.05 → false positive rate 23%. Bonferroni / Holm correction.

Ошибка 4. Peeking. Check p-value daily, stop при p<0.05 → inflated false positive rate. Sequential testing нужен.

Ошибка 5. Non-normal data. t-test assumes ~normal. Skewed revenue (whales) → bootstrap или non-param.

Связанные темы

FAQ

p < 0.05 — gold standard?

Convention. Some fields use 0.01. Decide upfront.

NORMAL_CDF в PostgreSQL?

Нет встроенной. Использовать erf или approximation. Или вычислять в Python.

p-value vs confidence interval?

CI more informative. «Effect 2-5%» лучше «p<0.05».

Bonferroni correction?

α_adjusted = α / k (k = number of comparisons). Conservative.

Bootstrap p-value?

Non-parametric. Re-sample with replacement, compute statistic distribution.