Feature engineering для time series на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Lag features
Past values как feature.
df['lag_1'] = df['sales'].shift(1)
df['lag_7'] = df['sales'].shift(7)
df['lag_30'] = df['sales'].shift(30)Daily, weekly, monthly lags — common.
Caveat. Не leak future. На time t — only data до t.
Rolling statistics
Aggregate over window.
df['rolling_mean_7'] = df['sales'].rolling(7).mean()
df['rolling_std_30'] = df['sales'].rolling(30).std()
df['rolling_max_14'] = df['sales'].rolling(14).max()Useful — trend, volatility.
Expanding window. From start до t.
Seasonal features
Cyclic encoding — see «feature engineering».
df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['day_of_week'] = df['ts'].dt.dayofweek
df['day_of_month'] = df['ts'].dt.day
df['week_of_year'] = df['ts'].dt.isocalendar().weekHoliday / event features
import holidays
df['is_holiday'] = df['date'].apply(lambda d: d в holidays.RU())
df['is_weekend'] = df['day_of_week'].isin([5, 6])Custom events. Marketing campaigns, product launches.
df['campaign_active'] = df['date'].between('2026-05-01', '2026-05-15')Cross-series features
Aggregate from related series.
Hierarchical. Total category sales (related items).
Geo. Average across nearby stores.
Cluster. Mean внутри similar items group.
Helps когда individual series sparse.
Связанные темы
- ARIMA на собесе DS
- Holt-Winters для DS
- Forecasting system design для DS
- Feature engineering для DS
- Подготовка к собесу Data Scientist
FAQ
Это официальная информация?
Нет. Статья основана на time series ML practices.
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.