Задачи по группировке SQL

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

«Посчитай revenue по странам» — простая GROUP BY. «Посчитай средний чек для top-10% клиентов и сравни с остальными» — уже непросто. На собесах middle-аналитика группировка и агрегация встречаются в каждой 3-й задаче.

Задача 1: revenue по странам

SELECT country, SUM(total) AS revenue
FROM orders
GROUP BY country
ORDER BY revenue DESC;

Базовая аггрегация.

Задача 2: users с > 3 заказами

SELECT user_id, COUNT(*) AS orders_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 3;

HAVING — после группировки. WHERE — до.

Задача 3: average без outliers

«Средний чек, игнорируя top 1%».

WITH ranked AS (
    SELECT total, PERCENT_RANK() OVER (ORDER BY total) AS pct
    FROM orders
)
SELECT AVG(total) AS trimmed_avg
FROM ranked
WHERE pct < 0.99;

Trimmed mean.

Задача 4: multi-level group by

«Revenue по стране и типу товара».

SELECT country, product_type, SUM(total) AS revenue
FROM orders
JOIN products USING (product_id)
GROUP BY country, product_type;

Задача 5: GROUP BY ROLLUP

«Subtotals + grand total».

SELECT country, product_type, SUM(total)
FROM orders
JOIN products USING (product_id)
GROUP BY ROLLUP(country, product_type);

Получим:

  • Россия, Electronics: 100
  • Россия, Clothing: 50
  • Россия, NULL: 150 (subtotal)
  • NULL, NULL: 150 (grand total)

Задача 6: GROUP BY GROUPING SETS

Кастомные sub-totals:

SELECT country, product_type, SUM(total)
FROM ...
GROUP BY GROUPING SETS (
    (country, product_type),
    (country),
    ()
);

Задача 7: conditional aggregation

«Сколько orders в каждом статусе, в одной строке».

SELECT
    COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed,
    COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending,
    COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled
FROM orders;

Pivoting через CASE.

Задача 8: median per group

SELECT
    country,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total) AS median_order
FROM orders
GROUP BY country;

В Postgres / BigQuery / Snowflake. В ClickHouse — quantile(0.5).

Задача 9: топ категория per country

«Для каждой страны — самая популярная категория».

WITH ranked AS (
    SELECT
        country,
        category,
        COUNT(*) AS orders,
        ROW_NUMBER() OVER (PARTITION BY country ORDER BY COUNT(*) DESC) AS rn
    FROM orders
    JOIN products USING (product_id)
    GROUP BY country, category
)
SELECT country, category, orders
FROM ranked WHERE rn = 1;

GROUP BY внутри window.

Задача 10: cumulative aggregation

«Накопленная сумма по дням».

SELECT
    day,
    SUM(total) AS daily_revenue,
    SUM(SUM(total)) OVER (ORDER BY day) AS cumulative
FROM orders
GROUP BY day;

Window over aggregated values.

Задача 11: average of averages vs weighted

Частая ловушка.

-- Average of averages — часто неправильно
SELECT AVG(avg_per_user)
FROM (
    SELECT user_id, AVG(total) AS avg_per_user
    FROM orders
    GROUP BY user_id
) t;

-- Correct weighted average
SELECT AVG(total) FROM orders;

Weighted (просто AVG) обычно правильнее.

Задача 12: COUNT DISTINCT по группам

«Уникальные users per day».

SELECT day, COUNT(DISTINCT user_id) AS dau
FROM events
GROUP BY day;

Задача 13: percentage within group

«Доля каждого product в своей категории».

SELECT
    category,
    product,
    SUM(revenue) AS prod_rev,
    SUM(revenue) * 100.0 / SUM(SUM(revenue)) OVER (PARTITION BY category) AS pct
FROM products
GROUP BY category, product;

Задача 14: аггрегация дат

«Первая покупка каждого user».

SELECT user_id, MIN(created_at) AS first_order
FROM orders
GROUP BY user_id;

Min / Max часто упускают в пользу ROW_NUMBER — но проще.

Частые ошибки

SELECT column, не в GROUP BY

SELECT country, product_type, SUM(total) FROM orders GROUP BY country;

Error (product_type не aggregated и не в group by).

AVG of averages

См. Задача 11. Обычно wrong.

COUNT(column) vs COUNT(*)

COUNT(column) — без NULL. COUNT(*) — все. Разница важна.

HAVING on non-aggregated

GROUP BY user_id HAVING created_at > '2026-01-01'

Error — HAVING только на aggregates.

На собесе

Пройдите 13 задач выше. Know:

  • GROUP BY vs HAVING
  • WHERE vs HAVING
  • ROLLUP / GROUPING SETS
  • Conditional aggregation
  • AVG ловушки

Связанные темы

FAQ

GROUP BY vs DISTINCT?

Для unique values — эквивалентны. GROUP BY удобнее с aggregation.

ROLLUP везде?

Postgres, Oracle, BigQuery. MySQL — через WITH ROLLUP.

Window в aggregation?

Works: SUM(SUM(x)) OVER (...) legal конструкция.


Тренируйте SQL — откройте тренажёр с 1500+ вопросами для собесов.