Задачи по SQL для middle-аналитика
Что ждут от middle
Middle-аналитика проверяют на:
- Продвинутые оконные функции (LAG, LEAD, SUM OVER, ROWS BETWEEN).
- Рекурсивные CTE.
- Session-based analysis.
- Когортные запросы.
- Сложные подзапросы и JOIN.
- Оптимизация запросов.
Ниже 15 задач реального middle-уровня.
1. Когортная таблица retention
WITH cohort AS (
SELECT user_id, DATE_TRUNC('month', MIN(event_time))::DATE AS cohort_month
FROM events GROUP BY user_id
),
activity AS (
SELECT c.user_id, c.cohort_month,
DATE_TRUNC('month', e.event_time)::DATE AS active_month,
EXTRACT(MONTH FROM AGE(DATE_TRUNC('month', e.event_time),
c.cohort_month))::int AS period_num
FROM cohort c
JOIN events e USING (user_id)
)
SELECT cohort_month, period_num,
COUNT(DISTINCT user_id) AS active_users
FROM activity
GROUP BY cohort_month, period_num
ORDER BY cohort_month, period_num;2. Session identification (30-минутный тайм-аут)
WITH diffs AS (
SELECT user_id, event_time,
EXTRACT(EPOCH FROM (event_time -
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)
)) / 60 AS min_gap
FROM events
),
sessions AS (
SELECT user_id, event_time,
SUM(CASE WHEN min_gap > 30 OR min_gap IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM diffs
)
SELECT user_id, session_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
COUNT(*) AS events_count
FROM sessions
GROUP BY user_id, session_id;3. Время между покупками (AVG gap per user)
WITH gaps AS (
SELECT user_id,
EXTRACT(DAY FROM (
created_at - LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at)
)) AS days_gap
FROM orders
)
SELECT user_id, AVG(days_gap) AS avg_gap_days
FROM gaps
WHERE days_gap IS NOT NULL
GROUP BY user_id;4. Воронка с временным окном (24 часа)
WITH events_with_prev AS (
SELECT user_id, event, event_time,
LAG(event) OVER (PARTITION BY user_id ORDER BY event_time) AS prev,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
FROM events
)
SELECT
COUNT(DISTINCT user_id) FILTER (WHERE event = 'view') AS step1,
COUNT(DISTINCT user_id) FILTER (
WHERE event = 'cart' AND prev = 'view'
AND event_time - prev_time < INTERVAL '24 hour'
) AS step2,
COUNT(DISTINCT user_id) FILTER (
WHERE event = 'purchase' AND prev = 'cart'
AND event_time - prev_time < INTERVAL '24 hour'
) AS step3
FROM events_with_prev;5. Рекурсивный CTE: иерархия категорий
WITH RECURSIVE tree AS (
SELECT id, name, parent_id, 1 AS level
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, t.level + 1
FROM categories c JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree ORDER BY level, name;Тренироваться на таких вопросах можно в Telegram-боте Карьерник — там 1500+ задач с реальных собесов с разборами.
6. Заполнить пропущенные дни нулями
WITH RECURSIVE cal AS (
SELECT DATE '2026-04-01' AS day
UNION ALL
SELECT day + INTERVAL '1 day' FROM cal WHERE day < DATE '2026-04-30'
)
SELECT c.day, COALESCE(SUM(o.amount), 0) AS revenue
FROM cal c
LEFT JOIN orders o ON o.created_at::DATE = c.day
GROUP BY c.day
ORDER BY c.day;7. Топ-3 продукта в каждой категории, с общей суммой
WITH ranked AS (
SELECT p.category, p.product_id, SUM(o.amount) AS rev,
ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY SUM(o.amount) DESC) AS rn,
SUM(SUM(o.amount)) OVER (PARTITION BY p.category) AS cat_total
FROM products p
JOIN orders o USING (product_id)
GROUP BY p.category, p.product_id
)
SELECT *,
ROUND(100.0 * rev / cat_total, 2) AS pct_of_category
FROM ranked
WHERE rn <= 3;8. Пользователи с прогрессивной активностью (месяц-над-месяцем)
WITH monthly AS (
SELECT user_id, DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS orders
FROM orders GROUP BY 1, 2
),
with_prev AS (
SELECT user_id, month, orders,
LAG(orders) OVER (PARTITION BY user_id ORDER BY month) AS prev_orders
FROM monthly
)
SELECT DISTINCT user_id
FROM with_prev
WHERE orders > prev_orders
AND month >= '2026-01-01';9. Sliding metrics — rolling 7 days unique users
SELECT day,
COUNT(DISTINCT user_id) OVER (
ORDER BY day
RANGE BETWEEN INTERVAL '6 day' PRECEDING AND CURRENT ROW
) AS rolling_dau_7
FROM (
SELECT created_at::DATE AS day, user_id FROM events
) d;10. Simpson's paradox — find aggregated trick
-- Платформа A: конверсия просела по каждому сегменту
-- Но общая конверсия выросла?!
WITH conv AS (
SELECT period, segment,
COUNT(*) FILTER (WHERE converted) * 1.0 / COUNT(*) AS conversion,
COUNT(*) AS volume
FROM events GROUP BY period, segment
)
SELECT period, segment, conversion, volume
FROM conv
ORDER BY period, segment;
-- Посмотреть, как сдвигается mix volume между сегментами11. Upsert через SQL (MERGE-ish в PostgreSQL)
INSERT INTO user_snapshots (user_id, snapshot_date, total_revenue)
SELECT user_id, CURRENT_DATE, SUM(amount)
FROM orders GROUP BY user_id
ON CONFLICT (user_id, snapshot_date) DO UPDATE
SET total_revenue = EXCLUDED.total_revenue;12. Аномалия: z-score на выручку
WITH stats AS (
SELECT AVG(amount) AS mu, STDDEV(amount) AS sd FROM orders
)
SELECT o.order_id, o.amount,
(o.amount - s.mu) / s.sd AS z_score
FROM orders o, stats s
WHERE ABS((o.amount - s.mu) / s.sd) > 3;К слову, набить руку на таких кейсах удобно через тренажёр в Telegram — разбирайте по 10 вопросов в день, через 2 недели тема становится рефлексом.
13. Сложный FILTER с агрегатом
SELECT category,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE amount > 1000 AND status = 'paid') AS paid_large,
SUM(amount) FILTER (WHERE status = 'paid') AS paid_revenue,
AVG(amount) FILTER (WHERE status = 'refunded') AS avg_refund
FROM orders JOIN products USING (product_id)
GROUP BY category;14. Декомпозиция GMV через multi-rollup
SELECT
COALESCE(platform, 'ALL') AS platform,
COALESCE(category, 'ALL') AS category,
SUM(amount) AS gmv
FROM orders JOIN products USING (product_id)
GROUP BY ROLLUP (platform, category);15. Lead/lag с condition — первый заказ после refund
WITH ordered AS (
SELECT user_id, order_id, created_at, status,
LAG(status) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_status
FROM orders
)
SELECT user_id, order_id, created_at
FROM ordered
WHERE prev_status = 'refunded' AND status != 'refunded';Как ответить на сложную SQL-задачу на собесе
- Проговорите подход перед кодом. «Разобью на CTE: сначала подготовлю X, потом рассчитаю Y».
- Пишите постепенно. Не весь запрос сразу — блоками.
- Комментируйте выбор. «CTE вместо подзапроса для читаемости».
- Упомяните альтернативы. «Можно через оконную функцию — будет быстрее».
- Признавайтесь в неуверенности. «Я бы проверил это через EXPLAIN».
Читайте также
- Задачи по SQL для junior
- Задачи на оконные функции
- Задачи на JOIN
- Задачи на когортный анализ
- Оптимизация SQL
FAQ
Сколько задач нужно решить для middle?
200-300 задач, включая сложные на оконные функции, рекурсивные CTE, session analysis. За 1-2 месяца реально.
Что отличает middle от junior?
Middle решает задачи, требующие нетривиальных приёмов (сессии, когорты, anti-patterns). Junior справляется с прямолинейными JOIN + GROUP BY.
Нужно ли знать CTE рекурсию?
На middle+ — да. На junior — не обязательно.
Как подготовиться к SQL-собесу middle?
- Прогнать 50 задач на оконные. 2. Прогнать 30 задач на CTE. 3. Понять EXPLAIN. 4. Практиковать live-coding перед зеркалом или на leetcode.