Как посчитать Statistical Power в SQL
Содержание:
Зачем Power
Power = 1 - β = вероятность detect real effect (true positive rate). Convention: 80%. Без enough power null result inconclusive.
Формула
Для proportions:
Power = Φ(z_α/2 - δ × sqrt(n / (2 × p × (1-p)))) + extra_correctionsSimplification:
Power ≈ Φ(δ × sqrt(n/(2pq)) - z_α/2)Базовый расчёт
Допустим baseline 5%, expected lift 10% relative (effect = 0.005), n = 10,000 per variant:
WITH params AS (
SELECT
0.05 AS baseline,
0.055 AS treatment,
10000 AS n_per_variant,
1.96 AS z_alpha_2
)
SELECT
baseline,
treatment,
treatment - baseline AS delta,
n_per_variant,
(treatment - baseline) * SQRT(n_per_variant / (2.0 * baseline * (1-baseline))) - z_alpha_2 AS z_power
-- Power = Φ(z_power)
FROM params;z_power ≈ 0.5. Φ(0.5) ≈ 0.69 → 69% power. Underpowered (need 80%).
Power curve
Как power растёт с sample:
WITH ns AS (
SELECT unnest(ARRAY[1000, 5000, 10000, 20000, 50000, 100000]) AS n
),
calc AS (
SELECT
n,
0.05 AS baseline,
0.055 AS treatment,
1.96 AS z_alpha
FROM ns
)
SELECT
n,
(treatment - baseline) * SQRT(n / (2.0 * baseline * (1-baseline))) - z_alpha AS z_power
FROM calc;Sample doubles → z_power increases by sqrt(2).
Post-hoc power
Power после теста (controversial — many statisticians не recommend):
WITH observed AS (
SELECT
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'A') AS n_a,
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'A' AND converted) AS x_a,
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'B') AS n_b,
COUNT(DISTINCT user_id) FILTER (WHERE variant = 'B' AND converted) AS x_b
FROM ab_test
)
SELECT
x_a::NUMERIC / n_a AS p_a,
x_b::NUMERIC / n_b AS p_b,
n_a, n_b,
-- Post-hoc power calc
ABS((x_b::NUMERIC/n_b) - (x_a::NUMERIC/n_a)) * SQRT(n_a / (2.0 * (x_a::NUMERIC/n_a) * (1 - (x_a::NUMERIC/n_a)))) - 1.96 AS z_power
FROM observed;Частые ошибки
Ошибка 1. Underpowered tests. Power 50% — coin flip. Wasteful.
Ошибка 2. Post-hoc power не actionable. Если result negative, post-hoc power = function of p-value. Tautology.
Ошибка 3. Power для wrong metric. Powered для primary metric, не для guardrails / secondary.
Ошибка 4. Variance overlooked. Continuous: power depends on σ. Underestimate σ → underpowered.
Ошибка 5. Multiple variants. 4-arm test → divide sample by 4 → power halves.
Связанные темы
- Как посчитать MDE в SQL
- Как посчитать sample size в SQL
- Как посчитать p-value в SQL
- Как посчитать confidence interval в SQL
FAQ
Какой Power ok?
80% convention. 90-95% для high-stakes decisions.
Power vs significance?
Power = 1-β = true positive rate. Significance = 1-α = true negative rate.
Increase power как?
- Larger sample. 2) Bigger expected effect. 3) Lower σ (cleaner metric). 4) Stronger α (но trade-off).
Power calculator vs SQL?
SQL для быстрых проверок. Real planning — calculator (statsig, evan miller).
Power decay в multi-armed test?
Yes — each comparison gets fewer users. Sample carefully.