Time series forecasting для аналитика

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

«Прогноз выручки на следующий квартал?» — типичная задача аналитика. «Спрос на товар во время sale?». «Когда нужно добавить серверов?». Всё это time series forecasting.

На собесах в retail / e-commerce / infra / finance time series знание ожидается от middle+.

Короткое объяснение

Time series forecasting = предсказать будущие значения на основе прошлых.

Отличие от regular regression: данные упорядочены во времени, наблюдения могут быть зависимы.

Компоненты time series

Trend

Долгосрочная direction (growth, decline).

Seasonality

Periodic patterns (daily, weekly, yearly).

Cyclic

Not fixed-period fluctuations (business cycles).

Residual / noise

Unexplained variance.

Decomposition:

Y = Trend + Seasonality + Residual

(additive) или multiplicative:

Y = Trend × Seasonality × Residual

Методы forecasting

1. Naive

forecast(t+1) = y(t)

Baseline. Удивительно часто beats complex.

2. Moving average

forecast = avg of last N points

Smoothing.

3. Exponential smoothing

Weight recent more:

forecast(t+1) = α × y(t) + (1-α) × forecast(t)

α — learning rate.

4. ARIMA

AutoRegressive Integrated Moving Average. Classical.

  • AR: uses past values
  • I: differencing для stationarity
  • MA: uses past errors

Parameters (p, d, q). Tuned.

5. Prophet (Facebook)

Auto-detects trend, seasonality, holidays.

from prophet import Prophet

df.columns = ['ds', 'y']  # Prophet convention
model = Prophet()
model.fit(df)

future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

Easy, хорош для business TS.

6. ML / Deep learning

XGBoost с lag features, LSTM, Transformers.

Complex, но powerful для big data с many features.

Metrics

MAE

Mean Absolute Error:

MAE = mean(|actual - predicted|)

MSE / RMSE

Squared errors. Penalizes big errors.

MAPE

Mean Absolute Percentage Error:

MAPE = mean(|actual - predicted| / actual) × 100%

Relative measure. Но проблема когда actual близко к 0.

SMAPE

Symmetric MAPE. Avoids division by 0 issues.

Stationarity

Series stationary если mean, variance, autocorrelation stable over time.

Many methods (ARIMA) требуют stationary. Non-stationary → differencing (Y_t - Y_{t-1}).

Tests:

  • ADF (Augmented Dickey-Fuller)
  • KPSS

Train/test split

Не random! Split chronologically:

Train: months 1-24
Test: months 25-30

Train ↓ chronologically → Test.

Cross-validation — rolling window или expanding.

Practical pipeline

  1. EDA: plot, decomposition, check seasonality
  2. Preprocess: handle missing, outliers, log-transform если heavy skew
  3. Split train/test
  4. Baseline: naive, moving average
  5. Try models: ARIMA, Prophet, XGBoost
  6. Evaluate: on test set
  7. Iterate

Практический код

import pandas as pd
from prophet import Prophet

# Данные
df = pd.read_csv('sales.csv')
df['ds'] = pd.to_datetime(df['date'])
df['y'] = df['sales']

# Model
model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    changepoint_prior_scale=0.05
)
model.fit(df[['ds', 'y']])

# Forecast
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

# Visualize
model.plot(forecast)

Проблемы

Data leakage

Случайно использовать future info в training. Careful with feature engineering.

Overfitting

Complex model на short history → bad generalization. Use validation.

Seasonality miss

Если data short — seasonality не captured.

Structural breaks

Regime changes (COVID, new competitor) — old patterns не hold.

Use cases

Revenue forecasting

Budget planning.

Inventory

Demand prediction.

Staffing

Call center, retail — shifts planning.

Web traffic

Capacity planning.

A/B-тесты

Counterfactual: «что было бы без treatment».

На собесе

«Какие methods forecasting?» Naive, moving avg, ARIMA, Prophet, XGBoost.

«Train/test split?» Chronological, не random.

«Какие metrics?» MAE, RMSE, MAPE.

«Baseline first?» Always. Ищете beat-the-baseline, не absolute accuracy.

Частые ошибки

Random split

Time series — NO random split.

Only one metric

Different metrics tell different stories.

Ignore seasonality

Business TS всегда имеет some seasonality.

Chasing best model

Simple model + domain knowledge beats complex model без domain.

Связанные темы

FAQ

Python vs R?

Python для integration. R для statistics depth. Both fine.

LSTM worth it?

Иногда. Classical methods often enough для business TS.

Prophet всегда?

Good default, но не всегда best. Always compare.


Тренируйте аналитику — откройте тренажёр с 1500+ вопросами для собесов.