Monte Carlo симуляция для аналитика

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Monte Carlo — мощный технический инструмент. Risk analysis, forecasting ranges, bootstrap — все basis.

Не для каждого analyst используется, но знать — expected middle+ skill.

Что такое

Generate many random scenarios → distribution outcomes.

Ask: «С какой probability X happens?»

Когда использовать

  • Risk analysis: financial, operational
  • Project estimates: project cost ranges
  • Forecasting: confidence intervals
  • Pricing: scenarios
  • Operational: queue simulation
  • Bootstrap (related)

Пример: project estimate

«Marketing campaign expected cost?».

Inputs:

  • CPC: normally distributed (mean 50 ₽, std 10)
  • Clicks needed: 10000-20000 (uniform)
  • Conversion rate: beta distribution

Model:

import numpy as np

n_sims = 10000
costs = []

for _ in range(n_sims):
    cpc = np.random.normal(50, 10)
    clicks = np.random.uniform(10000, 20000)
    cost = cpc * clicks
    costs.append(cost)

# Distribution of costs
print(f"Mean: {np.mean(costs):,.0f}")
print(f"Median: {np.median(costs):,.0f}")
print(f"P5-P95: {np.percentile(costs, 5):,.0f} - {np.percentile(costs, 95):,.0f}")

Get full distribution outcomes.

Revenue forecast

import numpy as np
from scipy import stats

n_sims = 10000
daily_revenues = []

for _ in range(n_sims):
    # Assume daily users normally distributed
    users = np.random.normal(1000, 100)
    # Conversion beta distributed
    cr = np.random.beta(5, 95)  # ~5% mean
    # Avg order log-normal
    aov = np.random.lognormal(np.log(1000), 0.5)
    
    revenue = users * cr * aov
    daily_revenues.append(revenue)

print(f"Expected: {np.mean(daily_revenues):,.0f}")
print(f"90% CI: {np.percentile(daily_revenues, 5):,.0f} - {np.percentile(daily_revenues, 95):,.0f}")

Provide точка + interval.

A/B power analysis

Simulate A/B results:

def simulate_ab(n_per_group, p_control, p_treatment, n_sims=1000):
    significant = 0
    for _ in range(n_sims):
        c_conv = np.random.binomial(n_per_group, p_control)
        t_conv = np.random.binomial(n_per_group, p_treatment)
        
        # Z-test
        p_c = c_conv / n_per_group
        p_t = t_conv / n_per_group
        p_pool = (c_conv + t_conv) / (2 * n_per_group)
        se = np.sqrt(p_pool * (1 - p_pool) * 2 / n_per_group)
        z = (p_t - p_c) / se
        
        if abs(z) > 1.96:
            significant += 1
    
    return significant / n_sims

# Power
power = simulate_ab(10000, 0.10, 0.11)
print(f"Power: {power:.2%}")

Alternative formulas. Harder, но more flexible.

Queue simulation

«Average wait time в call center»:

# Arrival times (Poisson)
# Service times (exponential)
# Simulate queue

def simulate_queue(arrival_rate, service_rate, n_agents, duration):
    # Discrete event simulation
    ...

Model operational scenarios.

Pricing scenarios

«New price +10% revenue change?»

Inputs:

  • Demand elasticity (distribution)
  • Current customers (known)
  • Retention impact (distribution)

Simulate outcomes:

  • Revenue +5% (most likely)
  • Range: -3% to +12%

Supports decision.

Bootstrap

Related:

Resample data → estimate statistic distribution.

def bootstrap(data, stat_func, n_iter=10000):
    stats = []
    for _ in range(n_iter):
        sample = np.random.choice(data, len(data), replace=True)
        stats.append(stat_func(sample))
    return stats

# Median distribution
median_samples = bootstrap(data, np.median)
ci = np.percentile(median_samples, [2.5, 97.5])

Non-parametric inference.

Distributions

Random sampling Python:

# Normal
np.random.normal(mu, sigma, size)

# Uniform
np.random.uniform(low, high, size)

# Log-normal
np.random.lognormal(mu, sigma, size)

# Beta
np.random.beta(a, b, size)

# Exponential
np.random.exponential(scale, size)

# Poisson
np.random.poisson(lam, size)

# Binomial
np.random.binomial(n, p, size)

Validation

Sanity check

  • Results реалистичны?
  • Cover edge cases?
  • Distribution shape makes sense?

Convergence

More iterations → stable result.

Check: double n_sims, result similar?

Assumption сheck

Right distribution chosen? Parameters correct?

Performance

Vectorize

Use numpy arrays, не loops:

# Loop (slow)
for _ in range(n):
    ...

# Vectorize (fast)
cpc = np.random.normal(50, 10, size=n_sims)
cost = cpc * clicks

10-100x speedup.

Parallel

Long simulations — multiprocessing.

Efficient

Large n_sims — memory watch.

Limitations

Assumption-dependent

Garbage in, garbage out. Wrong distribution → wrong result.

Not causal

Shows correlations, not causation.

Black box

For stakeholders, Monte Carlo может look opaque.

Explain carefully.

Communication

Present ranges

«Revenue 2026: expected 10M ₽, 90% confident between 8-12M».

More informative than point estimate.

Histogram / distribution

Show visual.

Scenarios

  • Worst case (p5)
  • Most likely (median)
  • Best case (p95)

Three numbers stakeholders digest.

Common mistakes

Deterministic inputs

Using point estimates instead distributions → false confidence.

Ignoring correlations

«Cost and conversion independent». Maybe correlated.

Too few iterations

1000 too few for tail probabilities. 10000+ для stable.

No sensitivity

Which input драйвит output? Sensitivity analysis.

Sensitivity analysis

Vary one input, check output.

for cpc_mean in [40, 50, 60]:
    # Run Monte Carlo
    result = simulate(cpc_mean=cpc_mean)
    print(f"CPC {cpc_mean}: expected cost {result:,.0f}")

Which input matters most.

Tools

Python

  • numpy / scipy
  • pymc (Bayesian)
  • simpy (discrete-event)

R

  • Native random functions
  • rstan (Bayesian)

Commercial

  • @RISK (Excel add-in)
  • Crystal Ball

На собесе

«Monte Carlo?»

Random scenarios simulation → outcome distribution.

«Когда?»

Risk analysis, forecasting ranges, non-analytical problems.

«Limits?»

Assumption-dependent. Not causal.

Связанные темы

FAQ

Analyst daily use?

Occasional. For specific tasks.

ML related?

Reinforcement learning uses. Analyst less.

Excel — можно?

Yes. Simple simulations. Python scales better.


Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.