Feed ranking ML system design на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Постановка задачи
Social feed — TikTok / Instagram / VK. Show top-N items per user.
Constraints:
- Catalog 1B+ items.
- 1M concurrent users.
- p99 latency < 200ms.
- Update with new items real-time.
Multi-stage architecture
1B items
↓ Candidate generation
1000 candidates
↓ Ranking
100 ranked
↓ Re-ranking + diversity
10 final → userКаждый stage — different cost/accuracy trade-off.
Candidate generation
Goal. Reduce 1B → ~1000.
Methods:
- Embedding retrieval (two-tower) — ANN search top-1000.
- Friend-based — items, liked by friends.
- Trending — globally popular.
- Recent — published last hour.
- Mix. Multiple sources, dedupe, merge.
Latency budget: ~50ms.
Ranking
Goal. Score 1000 candidates carefully.
Model. Wide & Deep, DCN, Transformer-based.
Features:
- User: profile, history, embeddings.
- Item: content embeddings, freshness, author.
- Cross: user-item interactions, user-author affinity.
- Context: time of day, device.
Multi-task. Predict click + like + share + watch_time. Combine via business weights.
Latency: ~50-100ms.
Re-ranking
Goal. Top-10 finalize.
- Diversity. Не показывать 10 cat photos. Penalize similar.
- Position bias correction.
- Business rules. Sponsored content slots, regulatory.
- Freshness boost.
Latency: ~10ms.
Метрики
Offline.
- NDCG@10.
- Hit@10.
- AUC per task.
Online (A/B test).
- CTR (click-through rate).
- Engagement (likes, shares, comments).
- Time spent.
- Daily active users.
- Long-term retention.
В production short-term and long-term metrics могут conflict (clickbait → high CTR, low retention).
Связанные темы
- Two-tower DSSM для DS
- Collaborative filtering для DS
- Ranking metrics NDCG для DS
- Multi-task learning для DS
- Подготовка к собесу Data Scientist
FAQ
Это официальная информация?
Нет. Статья основана на индустриальных практиках (YouTube, TikTok, Pinterest papers).
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.