Как посчитать Pareto (80/20) в SQL
Содержание:
Зачем Pareto
Классика бизнес-аналитики: «20% клиентов дают 80% revenue». Pareto показывает concentration. Если top-20% дают 95% revenue — extreme concentration, riske. Если 50% — диверсифицированная база.
Формула
Top X% клиентов by revenue cumulative — какая доля total revenue.Базовый расчёт
Данные: orders(user_id, total, status, created_at).
WITH user_revenue AS (
SELECT
user_id,
SUM(total) AS revenue
FROM orders
WHERE status = 'paid'
AND created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY user_id
),
ranked AS (
SELECT
user_id,
revenue,
SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
SUM(revenue) OVER () AS total_revenue,
ROW_NUMBER() OVER (ORDER BY revenue DESC) AS rn,
COUNT(*) OVER () AS total_users
FROM user_revenue
)
SELECT
rn,
user_id,
revenue,
rn::NUMERIC * 100 / total_users AS user_pct,
cumulative_revenue::NUMERIC * 100 / total_revenue AS rev_pct
FROM ranked
WHERE rn <= 100;Найдите row где user_pct ≈ 20 — это top-20% клиентов. Смотрите rev_pct.
По продуктам
WITH product_revenue AS (
SELECT product_id, SUM(amount) AS revenue
FROM order_items
GROUP BY product_id
),
ranked AS (
SELECT
product_id,
revenue,
SUM(revenue) OVER (ORDER BY revenue DESC) AS cum_rev,
SUM(revenue) OVER () AS total_rev,
ROW_NUMBER() OVER (ORDER BY revenue DESC) AS rn,
COUNT(*) OVER () AS total_products
FROM product_revenue
)
SELECT
rn,
product_id,
rn::NUMERIC * 100 / total_products AS product_pct,
cum_rev::NUMERIC * 100 / total_rev AS rev_pct
FROM ranked
ORDER BY rn;«SKU-tail»: 80% products дают 20% revenue. Часто candidates для discontinuation.
Cumulative chart
Bucketed:
WITH ranked AS (
SELECT
user_id,
SUM(total) AS revenue,
NTILE(10) OVER (ORDER BY SUM(total) DESC) AS decile
FROM orders
WHERE status = 'paid'
GROUP BY user_id
)
SELECT
decile,
COUNT(*) AS users_in_decile,
SUM(revenue) AS revenue,
SUM(revenue) * 100.0 / SUM(SUM(revenue)) OVER () AS rev_pct
FROM ranked
GROUP BY decile
ORDER BY decile;Decile 1 — top-10%. Их доля revenue показывает concentration.
Частые ошибки
Ошибка 1. Confuse 80/20 with literal numbers. «80% revenue from 20% customers» — это статистический рulr-of-thumb, не реальное правило. В вашем случае может быть 70/30 или 90/10.
Ошибка 2. Не cumulative. Считать каждого top-N independent — теряете точку distribution.
Ошибка 3. Кратко-срочное окно. 30 дней — носит лотерейный характер. Используйте 90+ days.
Ошибка 4. Включать non-active. Юзеры с 0 revenue — должны быть в знаменателе? Решите.
Ошибка 5. Pareto для всех metric. Pareto работает на revenue / orders. Для retention / engagement — формула не applies.
Связанные темы
FAQ
Какой Pareto considered ok?
Subscription / SaaS: 30-40% revenue от top-10%. Marketplace: 50-70%. Luxury / high-ticket: 70-90%.
Pareto слишком сильный — плохо?
Risk concentration. Если top-3 клиента уйдут — компания в risk. Diversify.
Pareto слишком слабый?
«No power users». Maybe нет engaged base. Тоже concerning, по-другому.
Pareto для разных metric?
Revenue — classic. Orders, sessions, post creations — также работают.
Time-based Pareto?
Может меняться: «80% revenue от 20% месяцев» (seasonal). Полезно для planning.