23 апреля 2026 г.·5 мин чтения

Как проектировать A/B-тест пошагово

Проверь себя · 1/3разбор после ответа

Что означает параметр бета (β) в планировании A/B-теста?

Зачем это знать

A/B test — не просто запустил code. Плохой design → wrong conclusions. Правильная последовательность shagov — discipline.

На собесах могут попросить walk through A/B design. Показывает experience.

Workflow шаги

Hypothesis — что тестируем, почему
Primary metric — что мерить
Secondary / guardrail
Sample size — how many users
Randomization — assignment strategy
Duration — how long run
Launch — deploy
Monitor — during run
Analyze — after complete
Decide — ship / rollback / iterate

Шаг 1: Hypothesis

Clear statement:

«If we change X, then Y will improve because Z».

Example:

«If мы упростим checkout form (remove 3 fields), then conversion rate increase by 5% because less friction».

Bad hypothesis

«Улучшить UX» (vague)
«Redesign будет лучше» (no mechanism)
«Что-то happens» (no prediction)

Good hypothesis

Specific change
Specific metric
Predicted direction
Mechanism

Шаг 2: Primary metric

One main metric.

Measurable
Sensitivity (can detect lift)
Aligned с business

Типичные

Conversion rate
Revenue per user
Retention D7
NPS

Tradeoffs

Revenue: ultimate business metric, but high variance
CR: sensitive, but proxy
Retention: long-term, slow

Pick based on goal.

Шаг 3: Secondary / guardrail

Secondary

Additional context:

CR across segments
Average order value
Time on site

Guardrail

Not должны ухудшиться:

Churn
Support tickets
Errors / crashes

«Treatment increased CR 5% but tripled refunds» — bad tradeoff.

Шаг 4: Sample size

See как рассчитать sample size.

Inputs:

Alpha (0.05)
Power (0.80)
Baseline
MDE

Formula или calculator.

Шаг 5: Randomization

Simple random

50/50 users. Most common.

Stratified

Random within strata (platform, country). Balances confounders.

Cluster

Randomize groups (city, team). For network effects.

Identifier

User ID → hash → bucket.

Ensures consistent assignment over time.

Шаг 6: Duration

Minimum

2 weeks обычно. Cover weekly patterns.

Maximum

Until N reached OR practical reasons stop.

Considerations

Novelty (settles в 2-4 weeks)
Seasonality
Traffic fluctuations

Pre-register

Set duration / N upfront. Prevents peeking.

Шаг 7: Launch

Ramp-up

Sometimes gradually:

1% → 10% → 50% → full
Catches critical bugs without full exposure

Flag

Feature flag system:

User based на bucket
Can turn off quickly

Observability

Log treatment assignment
Track key events

Шаг 8: Monitor

During run

Sample ratio mismatch (SRM)
Error rate spikes
Severe regression
Guardrail crashes

If severe harm → kill experiment early.

Don't peek primary

Results tempting. Don't look at lift midway (peeking problem).

Peek guardrails

Fine. Safety check.

Подготовься к собесу по A/B и статистике

300+ вопросов с разбором: дизайн, размер выборки, p-value, ловушки

Тренировать A/B в Telegram

Шаг 9: Analyze

Verify

SRM test
Assignment correct
Data integrity

Compute

Primary metric per variant
CI
P-value
Effect size

Segments

Per segment analysis
Heterogeneous effects

Validate

Cross-check с другим metric
Sanity check

Шаг 10: Decide

Framework

Primary significant + positive → strong ship
Primary not significant → don't ship (usually)
Primary positive, guardrail negative → tradeoff analysis
Mixed by segment → partial ship possible

Other considerations

Business value
Engineering cost maintain
Long-term effects (if mapped)

Шаблон plan document

# A/B Test Plan: [Feature Name]

## Hypothesis
If we change X, then Y will improve because Z.

## Primary metric
[Metric, definition, target MDE]

## Secondary
- X, Y, Z

## Guardrails
- Must not decrease: A, B, C

## Sample size
[Calculated N per group]

## Duration
[Planned weeks, based on traffic]

## Randomization
[Unit, split]

## Analysis plan
- T-test / chi-square
- Segments to check
- Stopping rules

## Risks
- Novelty effect possible
- Network effects from X

## Timeline
- Week 1: implementation
- Week 2-3: run
- Week 4: analyze

Common designs

Classic A/B

1 control, 1 treatment. Simplest.

Multivariate

Multiple variants (A/B/C/D). More combinations but need more traffic.

Factorial

Test multiple variables. 2×2 design: e.g., button color × copy.

Holdout

Some users never get new features. Long-term measurement. Holdout test.

Switchback

Same users alternating. Для network effects.

Pitfalls

Peeking

Multiple checks → FPR inflates.

SRM

Assignment unbalanced → biased results.

Leakage

Treatment affects control.

Novelty

Temporary lift from freshness.

Survivorship

Only «survivors» analyzed.

Multiple testing

10 metrics → 1 significant by chance. Correct.

Примеры bad tests

Without hypothesis

«Test new design». What effect expected? How measure?

Too short

2 days run. Not enough data.

Wrong metric

Test for revenue impact, но measure only click.

Changing midway

«Let's add another variant» — breaks stats.

Ethical

Informed?

Users знают? Often no (common in A/B).

Harm potential

Test must not significantly harm users.

Fairness

Pricing experiments особенно careful.

Tools

Platform

Optimizely
VWO
GrowthBook
Internal (Yandex ABT, Meta's platform)

Custom

Feature flags + SQL analysis. Many startups.

На собесе

«Design A/B test для [feature]»

Walk через 10 шагов.

Hypothesis → metric → N → duration → analysis → decision.

Show process maturity.

Связанные темы

FAQ

Всегда ли нужен A/B?

Не для everything. Big, reversible changes — yes. Small / OBS — no.

Bayesian подход?

Alternative. Probability-based. Peeking safer.

Сколько одновременно?

Limit. 5-10 major tests в team parallel OK. Больше — check для interactions.