Hadoop YARN на собеседовании Data Engineer
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Архитектура YARN
Resource manager Hadoop ecosystem.
ResourceManager (master) → NodeManagers (workers).
ResourceManager allocates resources, tracks status.Job submitted → RM starts ApplicationMaster (per app) → AM negotiates resources с RM, runs containers на NMs.
ResourceManager
Cluster brain. Knows total resources, tracks running apps, schedules new.
Components:
- Scheduler. Allocates по policy (Capacity, Fair, FIFO).
- Applications Manager. Manages app submissions.
HA — Active / Standby с ZooKeeper.
NodeManager
Per-node agent. Reports node resources (CPU, memory). Launches containers — apps run inside.
Heartbeat → RM.
Queues
Multi-tenant fairness.
queue=production: 60% capacity, priority high.
queue=research: 30%, priority low.
queue=adhoc: 10%.Capacity Scheduler. Fair-sharing с queue limits.
Fair Scheduler. Strives для equal share между users.
Vs Kubernetes
| YARN | Kubernetes | |
|---|---|---|
| Origin | Hadoop ecosystem | Container orchestration |
| Container | Hadoop-specific | Docker / OCI |
| Apps | Hadoop / Spark | Anything |
| Adoption modern | Declining | Dominant |
| Cloud-native | Less | Yes |
В 2026 — k8s replaces YARN. Cloud-native data platforms (Databricks, Spark on k8s) — на k8s.
Связанные темы
- Hadoop и MapReduce для DE
- HDFS для DE
- Spark on k8s для DE
- Spark RDD vs DataFrame для DE
- Подготовка к собесу Data Engineer
FAQ
Это официальная информация?
Нет. Статья основана на документации Apache Hadoop YARN.
Тренируйте Data Engineering — откройте тренажёр с 1500+ вопросами для собесов.