Monte Carlo симуляция для аналитика
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Monte Carlo — мощный технический инструмент. Risk analysis, forecasting ranges, bootstrap — все basis.
Не для каждого analyst используется, но знать — expected middle+ skill.
Что такое
Generate many random scenarios → distribution outcomes.
Ask: «С какой probability X happens?»
Когда использовать
- Risk analysis: financial, operational
- Project estimates: project cost ranges
- Forecasting: confidence intervals
- Pricing: scenarios
- Operational: queue simulation
- Bootstrap (related)
Пример: project estimate
«Marketing campaign expected cost?».
Inputs:
- CPC: normally distributed (mean 50 ₽, std 10)
- Clicks needed: 10000-20000 (uniform)
- Conversion rate: beta distribution
Model:
import numpy as np
n_sims = 10000
costs = []
for _ in range(n_sims):
cpc = np.random.normal(50, 10)
clicks = np.random.uniform(10000, 20000)
cost = cpc * clicks
costs.append(cost)
# Distribution of costs
print(f"Mean: {np.mean(costs):,.0f}")
print(f"Median: {np.median(costs):,.0f}")
print(f"P5-P95: {np.percentile(costs, 5):,.0f} - {np.percentile(costs, 95):,.0f}")Get full distribution outcomes.
Revenue forecast
import numpy as np
from scipy import stats
n_sims = 10000
daily_revenues = []
for _ in range(n_sims):
# Assume daily users normally distributed
users = np.random.normal(1000, 100)
# Conversion beta distributed
cr = np.random.beta(5, 95) # ~5% mean
# Avg order log-normal
aov = np.random.lognormal(np.log(1000), 0.5)
revenue = users * cr * aov
daily_revenues.append(revenue)
print(f"Expected: {np.mean(daily_revenues):,.0f}")
print(f"90% CI: {np.percentile(daily_revenues, 5):,.0f} - {np.percentile(daily_revenues, 95):,.0f}")Provide точка + interval.
A/B power analysis
Simulate A/B results:
def simulate_ab(n_per_group, p_control, p_treatment, n_sims=1000):
significant = 0
for _ in range(n_sims):
c_conv = np.random.binomial(n_per_group, p_control)
t_conv = np.random.binomial(n_per_group, p_treatment)
# Z-test
p_c = c_conv / n_per_group
p_t = t_conv / n_per_group
p_pool = (c_conv + t_conv) / (2 * n_per_group)
se = np.sqrt(p_pool * (1 - p_pool) * 2 / n_per_group)
z = (p_t - p_c) / se
if abs(z) > 1.96:
significant += 1
return significant / n_sims
# Power
power = simulate_ab(10000, 0.10, 0.11)
print(f"Power: {power:.2%}")Alternative formulas. Harder, но more flexible.
Queue simulation
«Average wait time в call center»:
# Arrival times (Poisson)
# Service times (exponential)
# Simulate queue
def simulate_queue(arrival_rate, service_rate, n_agents, duration):
# Discrete event simulation
...Model operational scenarios.
Pricing scenarios
«New price +10% revenue change?»
Inputs:
- Demand elasticity (distribution)
- Current customers (known)
- Retention impact (distribution)
Simulate outcomes:
- Revenue +5% (most likely)
- Range: -3% to +12%
Supports decision.
Bootstrap
Related:
Resample data → estimate statistic distribution.
def bootstrap(data, stat_func, n_iter=10000):
stats = []
for _ in range(n_iter):
sample = np.random.choice(data, len(data), replace=True)
stats.append(stat_func(sample))
return stats
# Median distribution
median_samples = bootstrap(data, np.median)
ci = np.percentile(median_samples, [2.5, 97.5])Non-parametric inference.
Distributions
Random sampling Python:
# Normal
np.random.normal(mu, sigma, size)
# Uniform
np.random.uniform(low, high, size)
# Log-normal
np.random.lognormal(mu, sigma, size)
# Beta
np.random.beta(a, b, size)
# Exponential
np.random.exponential(scale, size)
# Poisson
np.random.poisson(lam, size)
# Binomial
np.random.binomial(n, p, size)Validation
Sanity check
- Results реалистичны?
- Cover edge cases?
- Distribution shape makes sense?
Convergence
More iterations → stable result.
Check: double n_sims, result similar?
Assumption сheck
Right distribution chosen? Parameters correct?
Performance
Vectorize
Use numpy arrays, не loops:
# Loop (slow)
for _ in range(n):
...
# Vectorize (fast)
cpc = np.random.normal(50, 10, size=n_sims)
cost = cpc * clicks10-100x speedup.
Parallel
Long simulations — multiprocessing.
Efficient
Large n_sims — memory watch.
Limitations
Assumption-dependent
Garbage in, garbage out. Wrong distribution → wrong result.
Not causal
Shows correlations, not causation.
Black box
For stakeholders, Monte Carlo может look opaque.
Explain carefully.
Communication
Present ranges
«Revenue 2026: expected 10M ₽, 90% confident between 8-12M».
More informative than point estimate.
Histogram / distribution
Show visual.
Scenarios
- Worst case (p5)
- Most likely (median)
- Best case (p95)
Three numbers stakeholders digest.
Common mistakes
Deterministic inputs
Using point estimates instead distributions → false confidence.
Ignoring correlations
«Cost and conversion independent». Maybe correlated.
Too few iterations
1000 too few for tail probabilities. 10000+ для stable.
No sensitivity
Which input драйвит output? Sensitivity analysis.
Sensitivity analysis
Vary one input, check output.
for cpc_mean in [40, 50, 60]:
# Run Monte Carlo
result = simulate(cpc_mean=cpc_mean)
print(f"CPC {cpc_mean}: expected cost {result:,.0f}")Which input matters most.
Tools
Python
- numpy / scipy
pymc(Bayesian)simpy(discrete-event)
R
- Native random functions
rstan(Bayesian)
Commercial
- @RISK (Excel add-in)
- Crystal Ball
На собесе
«Monte Carlo?»
Random scenarios simulation → outcome distribution.
«Когда?»
Risk analysis, forecasting ranges, non-analytical problems.
«Limits?»
Assumption-dependent. Not causal.
Связанные темы
FAQ
Analyst daily use?
Occasional. For specific tasks.
ML related?
Reinforcement learning uses. Analyst less.
Excel — можно?
Yes. Simple simulations. Python scales better.
Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.