Time series forecasting для аналитика
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
«Прогноз выручки на следующий квартал?» — типичная задача аналитика. «Спрос на товар во время sale?». «Когда нужно добавить серверов?». Всё это time series forecasting.
На собесах в retail / e-commerce / infra / finance time series знание ожидается от middle+.
Короткое объяснение
Time series forecasting = предсказать будущие значения на основе прошлых.
Отличие от regular regression: данные упорядочены во времени, наблюдения могут быть зависимы.
Компоненты time series
Trend
Долгосрочная direction (growth, decline).
Seasonality
Periodic patterns (daily, weekly, yearly).
Cyclic
Not fixed-period fluctuations (business cycles).
Residual / noise
Unexplained variance.
Decomposition:
Y = Trend + Seasonality + Residual(additive) или multiplicative:
Y = Trend × Seasonality × ResidualМетоды forecasting
1. Naive
forecast(t+1) = y(t)Baseline. Удивительно часто beats complex.
2. Moving average
forecast = avg of last N pointsSmoothing.
3. Exponential smoothing
Weight recent more:
forecast(t+1) = α × y(t) + (1-α) × forecast(t)α — learning rate.
4. ARIMA
AutoRegressive Integrated Moving Average. Classical.
- AR: uses past values
- I: differencing для stationarity
- MA: uses past errors
Parameters (p, d, q). Tuned.
5. Prophet (Facebook)
Auto-detects trend, seasonality, holidays.
from prophet import Prophet
df.columns = ['ds', 'y'] # Prophet convention
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)Easy, хорош для business TS.
6. ML / Deep learning
XGBoost с lag features, LSTM, Transformers.
Complex, но powerful для big data с many features.
Metrics
MAE
Mean Absolute Error:
MAE = mean(|actual - predicted|)MSE / RMSE
Squared errors. Penalizes big errors.
MAPE
Mean Absolute Percentage Error:
MAPE = mean(|actual - predicted| / actual) × 100%Relative measure. Но проблема когда actual близко к 0.
SMAPE
Symmetric MAPE. Avoids division by 0 issues.
Stationarity
Series stationary если mean, variance, autocorrelation stable over time.
Many methods (ARIMA) требуют stationary. Non-stationary → differencing (Y_t - Y_{t-1}).
Tests:
- ADF (Augmented Dickey-Fuller)
- KPSS
Train/test split
Не random! Split chronologically:
Train: months 1-24
Test: months 25-30Train ↓ chronologically → Test.
Cross-validation — rolling window или expanding.
Practical pipeline
- EDA: plot, decomposition, check seasonality
- Preprocess: handle missing, outliers, log-transform если heavy skew
- Split train/test
- Baseline: naive, moving average
- Try models: ARIMA, Prophet, XGBoost
- Evaluate: on test set
- Iterate
Практический код
import pandas as pd
from prophet import Prophet
# Данные
df = pd.read_csv('sales.csv')
df['ds'] = pd.to_datetime(df['date'])
df['y'] = df['sales']
# Model
model = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
changepoint_prior_scale=0.05
)
model.fit(df[['ds', 'y']])
# Forecast
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)
# Visualize
model.plot(forecast)Проблемы
Data leakage
Случайно использовать future info в training. Careful with feature engineering.
Overfitting
Complex model на short history → bad generalization. Use validation.
Seasonality miss
Если data short — seasonality не captured.
Structural breaks
Regime changes (COVID, new competitor) — old patterns не hold.
Use cases
Revenue forecasting
Budget planning.
Inventory
Demand prediction.
Staffing
Call center, retail — shifts planning.
Web traffic
Capacity planning.
A/B-тесты
Counterfactual: «что было бы без treatment».
На собесе
«Какие methods forecasting?» Naive, moving avg, ARIMA, Prophet, XGBoost.
«Train/test split?» Chronological, не random.
«Какие metrics?» MAE, RMSE, MAPE.
«Baseline first?» Always. Ищете beat-the-baseline, не absolute accuracy.
Частые ошибки
Random split
Time series — NO random split.
Only one metric
Different metrics tell different stories.
Ignore seasonality
Business TS всегда имеет some seasonality.
Chasing best model
Simple model + domain knowledge beats complex model без domain.
Связанные темы
- Временной ряд простыми словами
- Cross-validation простыми словами
- Linear regression для аналитика
- Anomaly detection
FAQ
Python vs R?
Python для integration. R для statistics depth. Both fine.
LSTM worth it?
Иногда. Classical methods often enough для business TS.
Prophet всегда?
Good default, но не всегда best. Always compare.
Тренируйте аналитику — откройте тренажёр с 1500+ вопросами для собесов.