Как посчитать p-value в SQL
Содержание:
Зачем p-value
p-value = вероятность увидеть данный (или больший) effect, если null hypothesis истинна. Низкий p-value → reject H₀. p < 0.05 = «statistically significant» при α = 0.05.
z-test для пропорций
Для конверсий (A vs B):
WITH stats AS (
SELECT
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'A') AS n_a,
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'A' AND converted) AS x_a,
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'B') AS n_b,
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'B' AND converted) AS x_b
FROM ab_test
WHERE test_id = 123
),
calc AS (
SELECT
x_a::NUMERIC / n_a AS p_a,
x_b::NUMERIC / n_b AS p_b,
(x_a + x_b)::NUMERIC / (n_a + n_b) AS p_pool,
n_a, n_b, x_a, x_b
FROM stats
)
SELECT
p_a,
p_b,
p_b - p_a AS effect_size,
(p_b - p_a) / SQRT(p_pool * (1 - p_pool) * (1.0/n_a + 1.0/n_b)) AS z_stat,
-- p-value (two-tailed): 2 * (1 - Φ(|z|))
-- В PostgreSQL: используем нормальное распределение
2 * (1 - NORMAL_CDF(ABS((p_b - p_a) / SQRT(p_pool * (1 - p_pool) * (1.0/n_a + 1.0/n_b))))) AS p_value
FROM calc;Если NORMAL_CDF не доступен, можно через ERF:
Φ(z) = 0.5 × (1 + erf(z / sqrt(2)))t-test для continuous
Для AOV / revenue (continuous):
WITH a AS (
SELECT
AVG(amount) AS mean_a,
VAR_SAMP(amount) AS var_a,
COUNT(*) AS n_a
FROM ab_test_arpu
WHERE variant = 'A'
),
b AS (
SELECT
AVG(amount) AS mean_b,
VAR_SAMP(amount) AS var_b,
COUNT(*) AS n_b
FROM ab_test_arpu
WHERE variant = 'B'
)
SELECT
a.mean_a,
b.mean_b,
b.mean_b - a.mean_a AS effect,
-- Welch's t-stat
(b.mean_b - a.mean_a) / SQRT(a.var_a/a.n_a + b.var_b/b.n_b) AS t_stat
FROM a, b;p-value через t-distribution — обычно considered в Python (scipy.stats.t.sf).
Двусторонний vs односторонний
-- Two-tailed: are A and B different?
-- p = 2 × (1 - Φ(|z|))
-- One-tailed: is B better than A?
-- p = 1 - Φ(z)
-- (only valid if you committed to direction pre-test)Most tests two-tailed. One-tailed только если есть pre-registered hypothesis.
Интерпретация
p < 0.05 — significant. p > 0.05 — not significant. Но:
- p = 0.04 vs p = 0.06 — basically same evidence
- Effect size ≠ p-value
- Significant ≠ practically meaningful
Частые ошибки
Ошибка 1. p < 0.05 = «B лучше». p-value не говорит «B лучше». Reject null = different (could be A better если effect negative).
Ошибка 2. p > 0.05 = «no difference». Absence of evidence ≠ evidence of absence. Could be underpowered.
Ошибка 3. Multiple comparisons. 5 metrics @ α=0.05 → false positive rate 23%. Bonferroni / Holm correction.
Ошибка 4. Peeking. Check p-value daily, stop при p<0.05 → inflated false positive rate. Sequential testing нужен.
Ошибка 5. Non-normal data. t-test assumes ~normal. Skewed revenue (whales) → bootstrap или non-param.
Связанные темы
- Как посчитать confidence interval в SQL
- Как посчитать MDE в SQL
- Как посчитать sample size в SQL
- Как посчитать statistical power в SQL
FAQ
p < 0.05 — gold standard?
Convention. Some fields use 0.01. Decide upfront.
NORMAL_CDF в PostgreSQL?
Нет встроенной. Использовать erf или approximation. Или вычислять в Python.
p-value vs confidence interval?
CI more informative. «Effect 2-5%» лучше «p<0.05».
Bonferroni correction?
α_adjusted = α / k (k = number of comparisons). Conservative.
Bootstrap p-value?
Non-parametric. Re-sample with replacement, compute statistic distribution.