Workload management в DWH на собеседовании Data Engineer
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Зачем WLM
DWH multiple users / teams / workload types.
- ETL — batch, slow, resource-heavy.
- Analytics — interactive, fast.
- Dashboards — frequent, lightweight.
Без WLM — heavy ETL blocks dashboards.
Query queues
Categorize queries в queues.
ETL queue: max 5 concurrent, 50% memory.
Analyst queue: max 20 concurrent, 30% memory.
Dashboard queue: max 50 concurrent, 20% memory.Routing — by user role, query pattern, hint.
Priorities
Higher priority — more resources, скорее executed.
Priority levels: CRITICAL > HIGH > NORMAL > LOW.
Dashboards = HIGH (interactive).
ETL = NORMAL.
Backfill = LOW.При contention — high priority preempts (или throttles low).
Concurrency limits
Max queries running simultaneously.
Snowflake warehouse: max_concurrency_level = 8.Higher → more throughput, но contention. Lower → predictable performance, but queue grows.
В Redshift — Workload Management slots.
Resource isolation
Different workloads — different clusters / warehouses.
Snowflake. Multiple warehouses один account. Auto-suspend, auto-resume.
WH_ETL — large, batch.
WH_ANALYST — small, on-demand.
WH_DASH — auto-suspend after idle.ClickHouse / Greenplum. Resource pools / queues.
Связанные темы
- DWH cost optimization для DE
- Capacity planning для SA
- Snowflake vs BigQuery для DE
- DWH ClickHouse для DE
- Подготовка к собесу Data Engineer
FAQ
Это официальная информация?
Нет. Статья основана на документации Snowflake / Redshift / Greenplum.
Тренируйте Data Engineering — откройте тренажёр с 1500+ вопросами для собесов.