Apache Druid на собеседовании Data Engineer
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Что такое Druid
Open source real-time analytics DB. Sub-second queries на billions rows. Time-series oriented.
Use cases: real-time dashboards, click-stream analytics, network performance monitoring.
Архитектура
Multi-tier:
- Brokers. Query routing, merge results.
- Historicals. Serve data из segments.
- MiddleManagers. Ingest data, create segments.
- Coordinator. Manages segments distribution.
- Overlord. Manages ingestion tasks.
Metadata store — Postgres / MySQL.
Deep storage — S3 / HDFS.
Segments
Druid stores data в segments — partitioned by time.
2026-05-07T00:00:00Z to 2026-05-07T01:00:00Z (hour segment)Каждый segment — column-oriented, compressed, indexed.
Distributed across historical nodes для parallel query.
Ingestion
Real-time. Kafka indexer reads streams → segments в memory → flushed disk периодически.
Batch. From files (Parquet, CSV), JDBC sources.
Native ingestion spec. JSON config defining schema, time column, dimensions, metrics.
{
"dataSource": "events",
"timestampSpec": {"column": "event_time"},
"dimensionsSpec": {"dimensions": ["user_id", "event_type"]},
"metricsSpec": [{"type": "count", "name": "count"}, {"type": "sum", "fieldName": "amount", "name": "revenue"}]
}Queries
Druid SQL (since 0.10).
SELECT event_type, SUM(revenue) FROM events
WHERE __time >= TIMESTAMP '2026-05-01'
GROUP BY 1;Native JSON query для advanced.
Sub-second на billions rows для time-bounded queries.
Druid vs ClickHouse
| Druid | ClickHouse | |
|---|---|---|
| Time-series focus | High | Built-in |
| Real-time ingest | Excellent | Good |
| Joins | Limited | Better |
| SQL | SQL-like | Standard SQL |
| Multi-tier ops | Complex | Simpler |
В РФ: ClickHouse dominates over Druid. Druid — niche real-time analytics.
Связанные темы
- ClickHouse MergeTree для DE
- TSDB для DE
- DWH ClickHouse для DE
- Snowflake vs BigQuery для DE
- Подготовка к собесу Data Engineer
FAQ
Это официальная информация?
Нет. Статья основана на документации Apache Druid.
Тренируйте Data Engineering — откройте тренажёр с 1500+ вопросами для собесов.