Задачи по SQL для middle-аналитика

Что ждут от middle

Middle-аналитика проверяют на:

  • Продвинутые оконные функции (LAG, LEAD, SUM OVER, ROWS BETWEEN).
  • Рекурсивные CTE.
  • Session-based analysis.
  • Когортные запросы.
  • Сложные подзапросы и JOIN.
  • Оптимизация запросов.

Ниже 15 задач реального middle-уровня.

1. Когортная таблица retention

WITH cohort AS (
    SELECT user_id, DATE_TRUNC('month', MIN(event_time))::DATE AS cohort_month
    FROM events GROUP BY user_id
),
activity AS (
    SELECT c.user_id, c.cohort_month,
        DATE_TRUNC('month', e.event_time)::DATE AS active_month,
        EXTRACT(MONTH FROM AGE(DATE_TRUNC('month', e.event_time),
                               c.cohort_month))::int AS period_num
    FROM cohort c
    JOIN events e USING (user_id)
)
SELECT cohort_month, period_num,
    COUNT(DISTINCT user_id) AS active_users
FROM activity
GROUP BY cohort_month, period_num
ORDER BY cohort_month, period_num;

2. Session identification (30-минутный тайм-аут)

WITH diffs AS (
    SELECT user_id, event_time,
        EXTRACT(EPOCH FROM (event_time -
            LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)
        )) / 60 AS min_gap
    FROM events
),
sessions AS (
    SELECT user_id, event_time,
        SUM(CASE WHEN min_gap > 30 OR min_gap IS NULL THEN 1 ELSE 0 END)
            OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
    FROM diffs
)
SELECT user_id, session_id,
    MIN(event_time) AS session_start,
    MAX(event_time) AS session_end,
    COUNT(*) AS events_count
FROM sessions
GROUP BY user_id, session_id;

3. Время между покупками (AVG gap per user)

WITH gaps AS (
    SELECT user_id,
        EXTRACT(DAY FROM (
            created_at - LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at)
        )) AS days_gap
    FROM orders
)
SELECT user_id, AVG(days_gap) AS avg_gap_days
FROM gaps
WHERE days_gap IS NOT NULL
GROUP BY user_id;

4. Воронка с временным окном (24 часа)

WITH events_with_prev AS (
    SELECT user_id, event, event_time,
        LAG(event) OVER (PARTITION BY user_id ORDER BY event_time) AS prev,
        LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
    FROM events
)
SELECT
    COUNT(DISTINCT user_id) FILTER (WHERE event = 'view') AS step1,
    COUNT(DISTINCT user_id) FILTER (
        WHERE event = 'cart' AND prev = 'view'
        AND event_time - prev_time < INTERVAL '24 hour'
    ) AS step2,
    COUNT(DISTINCT user_id) FILTER (
        WHERE event = 'purchase' AND prev = 'cart'
        AND event_time - prev_time < INTERVAL '24 hour'
    ) AS step3
FROM events_with_prev;

5. Рекурсивный CTE: иерархия категорий

WITH RECURSIVE tree AS (
    SELECT id, name, parent_id, 1 AS level
    FROM categories WHERE parent_id IS NULL
    UNION ALL
    SELECT c.id, c.name, c.parent_id, t.level + 1
    FROM categories c JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree ORDER BY level, name;

Тренироваться на таких вопросах можно в Telegram-боте Карьерник — там 1500+ задач с реальных собесов с разборами.

6. Заполнить пропущенные дни нулями

WITH RECURSIVE cal AS (
    SELECT DATE '2026-04-01' AS day
    UNION ALL
    SELECT day + INTERVAL '1 day' FROM cal WHERE day < DATE '2026-04-30'
)
SELECT c.day, COALESCE(SUM(o.amount), 0) AS revenue
FROM cal c
LEFT JOIN orders o ON o.created_at::DATE = c.day
GROUP BY c.day
ORDER BY c.day;

7. Топ-3 продукта в каждой категории, с общей суммой

WITH ranked AS (
    SELECT p.category, p.product_id, SUM(o.amount) AS rev,
        ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY SUM(o.amount) DESC) AS rn,
        SUM(SUM(o.amount)) OVER (PARTITION BY p.category) AS cat_total
    FROM products p
    JOIN orders o USING (product_id)
    GROUP BY p.category, p.product_id
)
SELECT *,
    ROUND(100.0 * rev / cat_total, 2) AS pct_of_category
FROM ranked
WHERE rn <= 3;

8. Пользователи с прогрессивной активностью (месяц-над-месяцем)

WITH monthly AS (
    SELECT user_id, DATE_TRUNC('month', created_at) AS month,
        COUNT(*) AS orders
    FROM orders GROUP BY 1, 2
),
with_prev AS (
    SELECT user_id, month, orders,
        LAG(orders) OVER (PARTITION BY user_id ORDER BY month) AS prev_orders
    FROM monthly
)
SELECT DISTINCT user_id
FROM with_prev
WHERE orders > prev_orders
  AND month >= '2026-01-01';

9. Sliding metrics — rolling 7 days unique users

SELECT day,
    COUNT(DISTINCT user_id) OVER (
        ORDER BY day
        RANGE BETWEEN INTERVAL '6 day' PRECEDING AND CURRENT ROW
    ) AS rolling_dau_7
FROM (
    SELECT created_at::DATE AS day, user_id FROM events
) d;

10. Simpson's paradox — find aggregated trick

-- Платформа A: конверсия просела по каждому сегменту
-- Но общая конверсия выросла?!
WITH conv AS (
    SELECT period, segment,
        COUNT(*) FILTER (WHERE converted) * 1.0 / COUNT(*) AS conversion,
        COUNT(*) AS volume
    FROM events GROUP BY period, segment
)
SELECT period, segment, conversion, volume
FROM conv
ORDER BY period, segment;
-- Посмотреть, как сдвигается mix volume между сегментами

11. Upsert через SQL (MERGE-ish в PostgreSQL)

INSERT INTO user_snapshots (user_id, snapshot_date, total_revenue)
SELECT user_id, CURRENT_DATE, SUM(amount)
FROM orders GROUP BY user_id
ON CONFLICT (user_id, snapshot_date) DO UPDATE
SET total_revenue = EXCLUDED.total_revenue;

12. Аномалия: z-score на выручку

WITH stats AS (
    SELECT AVG(amount) AS mu, STDDEV(amount) AS sd FROM orders
)
SELECT o.order_id, o.amount,
    (o.amount - s.mu) / s.sd AS z_score
FROM orders o, stats s
WHERE ABS((o.amount - s.mu) / s.sd) > 3;

К слову, набить руку на таких кейсах удобно через тренажёр в Telegram — разбирайте по 10 вопросов в день, через 2 недели тема становится рефлексом.

13. Сложный FILTER с агрегатом

SELECT category,
    COUNT(*) AS total,
    COUNT(*) FILTER (WHERE amount > 1000 AND status = 'paid') AS paid_large,
    SUM(amount) FILTER (WHERE status = 'paid') AS paid_revenue,
    AVG(amount) FILTER (WHERE status = 'refunded') AS avg_refund
FROM orders JOIN products USING (product_id)
GROUP BY category;

14. Декомпозиция GMV через multi-rollup

SELECT
    COALESCE(platform, 'ALL') AS platform,
    COALESCE(category, 'ALL') AS category,
    SUM(amount) AS gmv
FROM orders JOIN products USING (product_id)
GROUP BY ROLLUP (platform, category);

15. Lead/lag с condition — первый заказ после refund

WITH ordered AS (
    SELECT user_id, order_id, created_at, status,
        LAG(status) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_status
    FROM orders
)
SELECT user_id, order_id, created_at
FROM ordered
WHERE prev_status = 'refunded' AND status != 'refunded';

Как ответить на сложную SQL-задачу на собесе

  1. Проговорите подход перед кодом. «Разобью на CTE: сначала подготовлю X, потом рассчитаю Y».
  2. Пишите постепенно. Не весь запрос сразу — блоками.
  3. Комментируйте выбор. «CTE вместо подзапроса для читаемости».
  4. Упомяните альтернативы. «Можно через оконную функцию — будет быстрее».
  5. Признавайтесь в неуверенности. «Я бы проверил это через EXPLAIN».

Читайте также

FAQ

Сколько задач нужно решить для middle?

200-300 задач, включая сложные на оконные функции, рекурсивные CTE, session analysis. За 1-2 месяца реально.

Что отличает middle от junior?

Middle решает задачи, требующие нетривиальных приёмов (сессии, когорты, anti-patterns). Junior справляется с прямолинейными JOIN + GROUP BY.

Нужно ли знать CTE рекурсию?

На middle+ — да. На junior — не обязательно.

Как подготовиться к SQL-собесу middle?

  1. Прогнать 50 задач на оконные. 2. Прогнать 30 задач на CTE. 3. Понять EXPLAIN. 4. Практиковать live-coding перед зеркалом или на leetcode.