Задачи по группировке SQL
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
«Посчитай revenue по странам» — простая GROUP BY. «Посчитай средний чек для top-10% клиентов и сравни с остальными» — уже непросто. На собесах middle-аналитика группировка и агрегация встречаются в каждой 3-й задаче.
Задача 1: revenue по странам
SELECT country, SUM(total) AS revenue
FROM orders
GROUP BY country
ORDER BY revenue DESC;Базовая аггрегация.
Задача 2: users с > 3 заказами
SELECT user_id, COUNT(*) AS orders_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 3;HAVING — после группировки. WHERE — до.
Задача 3: average без outliers
«Средний чек, игнорируя top 1%».
WITH ranked AS (
SELECT total, PERCENT_RANK() OVER (ORDER BY total) AS pct
FROM orders
)
SELECT AVG(total) AS trimmed_avg
FROM ranked
WHERE pct < 0.99;Trimmed mean.
Задача 4: multi-level group by
«Revenue по стране и типу товара».
SELECT country, product_type, SUM(total) AS revenue
FROM orders
JOIN products USING (product_id)
GROUP BY country, product_type;Задача 5: GROUP BY ROLLUP
«Subtotals + grand total».
SELECT country, product_type, SUM(total)
FROM orders
JOIN products USING (product_id)
GROUP BY ROLLUP(country, product_type);Получим:
- Россия, Electronics: 100
- Россия, Clothing: 50
- Россия, NULL: 150 (subtotal)
- NULL, NULL: 150 (grand total)
Задача 6: GROUP BY GROUPING SETS
Кастомные sub-totals:
SELECT country, product_type, SUM(total)
FROM ...
GROUP BY GROUPING SETS (
(country, product_type),
(country),
()
);Задача 7: conditional aggregation
«Сколько orders в каждом статусе, в одной строке».
SELECT
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending,
COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled
FROM orders;Pivoting через CASE.
Задача 8: median per group
SELECT
country,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total) AS median_order
FROM orders
GROUP BY country;В Postgres / BigQuery / Snowflake. В ClickHouse — quantile(0.5).
Задача 9: топ категория per country
«Для каждой страны — самая популярная категория».
WITH ranked AS (
SELECT
country,
category,
COUNT(*) AS orders,
ROW_NUMBER() OVER (PARTITION BY country ORDER BY COUNT(*) DESC) AS rn
FROM orders
JOIN products USING (product_id)
GROUP BY country, category
)
SELECT country, category, orders
FROM ranked WHERE rn = 1;GROUP BY внутри window.
Задача 10: cumulative aggregation
«Накопленная сумма по дням».
SELECT
day,
SUM(total) AS daily_revenue,
SUM(SUM(total)) OVER (ORDER BY day) AS cumulative
FROM orders
GROUP BY day;Window over aggregated values.
Задача 11: average of averages vs weighted
Частая ловушка.
-- Average of averages — часто неправильно
SELECT AVG(avg_per_user)
FROM (
SELECT user_id, AVG(total) AS avg_per_user
FROM orders
GROUP BY user_id
) t;
-- Correct weighted average
SELECT AVG(total) FROM orders;Weighted (просто AVG) обычно правильнее.
Задача 12: COUNT DISTINCT по группам
«Уникальные users per day».
SELECT day, COUNT(DISTINCT user_id) AS dau
FROM events
GROUP BY day;Задача 13: percentage within group
«Доля каждого product в своей категории».
SELECT
category,
product,
SUM(revenue) AS prod_rev,
SUM(revenue) * 100.0 / SUM(SUM(revenue)) OVER (PARTITION BY category) AS pct
FROM products
GROUP BY category, product;Задача 14: аггрегация дат
«Первая покупка каждого user».
SELECT user_id, MIN(created_at) AS first_order
FROM orders
GROUP BY user_id;Min / Max часто упускают в пользу ROW_NUMBER — но проще.
Частые ошибки
SELECT column, не в GROUP BY
SELECT country, product_type, SUM(total) FROM orders GROUP BY country;Error (product_type не aggregated и не в group by).
AVG of averages
См. Задача 11. Обычно wrong.
COUNT(column) vs COUNT(*)
COUNT(column) — без NULL. COUNT(*) — все. Разница важна.
HAVING on non-aggregated
GROUP BY user_id HAVING created_at > '2026-01-01'Error — HAVING только на aggregates.
На собесе
Пройдите 13 задач выше. Know:
- GROUP BY vs HAVING
- WHERE vs HAVING
- ROLLUP / GROUPING SETS
- Conditional aggregation
- AVG ловушки
Связанные темы
FAQ
GROUP BY vs DISTINCT?
Для unique values — эквивалентны. GROUP BY удобнее с aggregation.
ROLLUP везде?
Postgres, Oracle, BigQuery. MySQL — через WITH ROLLUP.
Window в aggregation?
Works: SUM(SUM(x)) OVER (...) legal конструкция.
Тренируйте SQL — откройте тренажёр с 1500+ вопросами для собесов.