Curriculum learning на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Идея curriculum learning
Like humans — easier examples first, harder later. Improves convergence, generalization.
Bengio 2009 — formalized. Show that order of training data matters.
Easy-to-hard
Sort training data по difficulty. Start с easy, gradually add hard.
Difficulty measures:
- Loss prediction (predicted easier).
- Length (short sequences first).
- Confidence (current model confidence — high = easy).
- External (manual labeling difficulty).
Epoch 1: top 30% easiest.
Epoch 5: top 60%.
Epoch 10: all.Anti-curriculum. Hard first — sometimes works для robustness.
Self-paced learning
Model decides what's easy / hard itself.
loss = main_loss + λ · regularizer(weights, sample_difficulty)Samples с low loss → high weight (used). High loss → low weight (skipped).
λ decreased over training → model gradually accepts harder.
Применения
LLM training. Order data по quality / complexity. Common practice большие labs.
RL. Curriculum environments — start с easy levels.
Speech / NLP. Short utterances first.
Imitation learning. Demonstrations с increasing complexity.
Robotics. Easy tasks → general → specific.
Связанные темы
- Active learning для DS
- Self-supervised learning для DS
- Reinforcement learning для DS
- Few-shot learning для DS
- Подготовка к собесу Data Scientist
FAQ
Это официальная информация?
Нет. Статья основана на работах Bengio 2009, Kumar 2010 (self-paced).
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.