Bootstrap в A/B-тестах
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Revenue per user распределён log-normal. T-test assumes normal → на small samples врёт. Bootstrap — non-parametric alternative.
На middle+ собесах и в real A/B analysis bootstrap — часто employed. Know его.
Что такое bootstrap
Resampling with replacement из observed data → estimate distribution of statistic.
Простой и powerful.
Алгоритм
- Observed data: N values
- Sample (with replacement) N values from observed → bootstrap sample
- Compute statistic (mean, median, etc.) from sample
- Repeat 1000-10000 times
- Distribution of statistics → approximation sampling distribution
В Python
import numpy as np
def bootstrap_mean(data, n_iter=10000):
means = []
for _ in range(n_iter):
sample = np.random.choice(data, size=len(data), replace=True)
means.append(sample.mean())
return np.array(means)
# Usage
data = np.random.exponential(scale=10, size=1000)
boot_means = bootstrap_mean(data)
# CI
print(np.percentile(boot_means, [2.5, 97.5]))Для A/B
Compare 2 groups:
def bootstrap_diff(control, treatment, n_iter=10000):
diffs = []
for _ in range(n_iter):
c_sample = np.random.choice(control, len(control), replace=True)
t_sample = np.random.choice(treatment, len(treatment), replace=True)
diffs.append(t_sample.mean() - c_sample.mean())
return np.array(diffs)
# Usage
diffs = bootstrap_diff(control_revenue, treatment_revenue)
# CI для разницы
ci_lower, ci_upper = np.percentile(diffs, [2.5, 97.5])
# P-value (approximate)
# Under H0: diff = 0
p_value = 2 * min(
np.mean(diffs > 0),
np.mean(diffs < 0)
)Permutation test
Related. More proper для null hypothesis test:
def permutation_test(control, treatment, n_iter=10000):
observed_diff = treatment.mean() - control.mean()
combined = np.concatenate([control, treatment])
n_c = len(control)
null_diffs = []
for _ in range(n_iter):
np.random.shuffle(combined)
new_c = combined[:n_c]
new_t = combined[n_c:]
null_diffs.append(new_t.mean() - new_c.mean())
p_value = np.mean(np.abs(null_diffs) >= np.abs(observed_diff))
return p_valueBootstrap vs t-test
T-test
- Assumes normal distribution (или CLT applies)
- Analytical
- Fast
- Standard
Bootstrap
- Non-parametric
- Works для любой statistic (median, percentile)
- Slower compute
- Flexible
Когда bootstrap
1. Non-normal data
Revenue, session length — heavy-tailed. T-test underestimates variance.
2. Custom metrics
«Average LTV за cohort» с complex aggregation. T-test hard.
3. Ratio metrics
«Revenue per session» — ratio. Variance calc complex. Bootstrap — easy.
4. Median / percentile
T-test для mean. Bootstrap — для anything.
Bootstrap pitfalls
1. Small N
Если data N = 20, bootstrap doesn't magically fix. Just re-uses same 20 points.
Rule: минимум N = 100+ для bootstrap reliable.
2. Dependence
Bootstrap assumes IID. Time series, clustered data — violates.
Block bootstrap для time series.
3. Computational cost
10000 iterations × complex metric × large data = hours.
Parallelize или subsample.
4. Extreme values
Если metric heavy outliers — outliers dominate.
Use robust statistics (median, winsorized).
Bayesian A/B
Related approach:
Posterior на metric. Probability B > A.
# Simplified Beta-Binomial
from scipy.stats import beta
# Prior: Beta(1, 1) = uniform
alpha_c, beta_c = 1 + conversions_c, 1 + (users_c - conversions_c)
alpha_t, beta_t = 1 + conversions_t, 1 + (users_t - conversions_t)
# Sample from posteriors
samples_c = beta.rvs(alpha_c, beta_c, size=10000)
samples_t = beta.rvs(alpha_t, beta_t, size=10000)
# Probability treatment > control
prob = np.mean(samples_t > samples_c)Не bootstrap, но похожая idea (samples from posterior).
Использование в компаниях
- Airbnb, Uber: bootstrap для booking metrics
- Netflix: complex metrics через bootstrap
- Microsoft ExP: combines t-test и bootstrap
Modern A/B platform обычно supports bootstrap internally.
Performance tricks
Numpy vectorize
Don't for-loop. Use numpy arrays.
Subsample
For very large data — subsample first, bootstrap после.
Parallel
multiprocessing для paralleliz.
JAX / numba
Compile for speedup.
Connecting к traditional
Bootstrap CI ≈ normal CI для normal data (big N).
Для non-normal — bootstrap gives better coverage.
T-test good default, bootstrap safer fallback.
На собесе
«Bootstrap — что?» Resample observed data с replacement, estimate statistic distribution.
«Когда?» Non-normal, complex metrics, small samples.
«Alternatives?» T-test (normal), Bayesian (priors), permutation (null).
«Bootstrap всегда better?» No. Slower, requires N = 100+.
Частые ошибки
Bootstrap для tiny N
Don't work well.
Ignore assumptions
Still нужны independence, representative data.
Blindly применять
Think if bootstrap fits problem. Not magic.
No verification
Cross-check с другим method (t-test) for sanity.
Связанные темы
- Bootstrap простыми словами
- A/B-тест простыми словами
- t-test простыми словами
- CUPED простыми словами
- Log-normal распределение
FAQ
Сколько iterations?
10000 достаточно обычно.
Works для medians?
Yes. Sampling distribution медианы через bootstrap.
Bayesian vs Bootstrap?
Different frameworks. Bayesian — priors + posterior. Bootstrap — frequentist resampling. Complementary.
Тренируйте A/B — откройте тренажёр с 1500+ вопросами для собесов.