Recsys ML system design на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Постановка задачи
E-commerce / streaming recsys.
Constraints:
- 100M users, 10M items.
- Real-time recommendations.
- Continuous catalog growth.
Hybrid approaches
Collaborative filtering. «Users like you also liked». Strong на established items / users.
Content-based. Item embeddings (BERT, image). Strong для cold start.
Combined. Multi-stage:
Stage 1: Candidate generation
- Two-tower CF (top-1000).
- Content-based для new items (top-100).
- Trending / popular для cold start (top-50).
Merged → ~1500 candidates.
Stage 2: Ranking model on combined features.
Stage 3: Re-ranking — diversity, business rules.Cold start
New user.
- Onboarding survey.
- Demographic-based default.
- Trending recommendations.
- Popular items.
New item.
- Content embedding (text / image).
- Two-tower model with item features (не только ID).
- Bandit exploration — show к users randomly, learn rapidly.
New domain / category. Transfer learning от similar.
Online learning
В addition to batch training.
- Real-time feature updates (last action seconds ago).
- Streaming model updates (Vowpal Wabbit, FTRL).
- Bandit exploration / exploitation.
Real-time updates
User clicks → event → feature store updated.
User: viewed_item_recently=item_xyz, last_active=now.
Next request: ranker uses fresh features.Time-decay engagement signals — recent matters more.
Метрики
Offline:
- NDCG, Recall@K, Hit@K.
- Coverage (catalog diversity).
Online:
- CTR.
- Conversion rate.
- Revenue per user.
- Retention long-term.
- Diversity / serendipity.
Beware — clickbait optimization. Long-term retention важно.
Связанные темы
- Collaborative filtering для DS
- Two-tower DSSM для DS
- Feed ranking system design для DS
- Multi-armed bandit для DS
- Подготовка к собесу Data Scientist
FAQ
Это официальная информация?
Нет. Статья основана на индустриальных recsys practices (Spotify, Netflix, YouTube).
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.