Streaming SQL на собеседовании Data Engineer
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Зачем streaming SQL
Stream processing requires complex code (Java / Scala / Python). Streaming SQL делает доступным аналитикам / SQL-проф.
-- streaming aggregation
CREATE STREAM orders_by_country AS
SELECT country, COUNT(*) AS cnt, SUM(amount) AS revenue
FROM orders_stream
GROUP BY country;Auto-updating результаты на новые events.
ksqlDB
SQL поверх Kafka.
CREATE STREAM orders WITH (KAFKA_TOPIC='orders', VALUE_FORMAT='JSON');
CREATE TABLE country_revenue AS
SELECT country, SUM(amount) AS total
FROM orders
GROUP BY country
EMIT CHANGES;EMIT CHANGES — push results в downstream Kafka.
Flink SQL
CREATE TABLE orders (
order_id BIGINT,
amount DECIMAL,
ts TIMESTAMP(3),
WATERMARK FOR ts AS ts - INTERVAL '5' SECOND
) WITH ('connector' = 'kafka', ...);
SELECT TUMBLE_START(ts, INTERVAL '1' HOUR) AS window_start,
SUM(amount) AS hourly_revenue
FROM orders
GROUP BY TUMBLE(ts, INTERVAL '1' HOUR);Standard SQL с windowing extensions.
Materialize
Postgres-compatible streaming DB.
CREATE SOURCE orders FROM KAFKA BROKER '...' TOPIC 'orders';
CREATE MATERIALIZED VIEW country_revenue AS
SELECT country, SUM(amount) FROM orders GROUP BY country;
SELECT * FROM country_revenue; -- always freshMaintenance free — incrementally updated.
Use cases
Real-time dashboards. Always fresh data.
Operational analytics. Detect anomalies seconds после occur.
Triggers / alerts. Stream + window + condition → notification.
Streaming ETL. Kafka topic → cleaned / enriched → another topic.
Связанные темы
- Kafka на собесе DE
- Spark Structured Streaming для DE
- Apache Flink для DE
- Kafka Streams для DE
- Подготовка к собесу Data Engineer
FAQ
Это официальная информация?
Нет. Статья основана на документации ksqlDB / Flink / Materialize.
Тренируйте Data Engineering — откройте тренажёр с 1500+ вопросами для собесов.