Как посчитать cumulative sum в SQL
Содержание:
Зачем cumulative sum
Daily revenue показывает spikes. Cumulative — total накопленный over time. Хорошо для growth charts, milestone reporting, comparison with goals.
Базовый расчёт
SELECT
DATE,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY DATE
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM daily_revenue
ORDER BY DATE;Simpler form:
SUM(daily_revenue) OVER (ORDER BY DATE)(default frame для ORDER BY = UNBOUNDED PRECEDING to CURRENT ROW.)
По группам
Per-channel cumulative:
SELECT
DATE,
channel,
daily_revenue,
SUM(daily_revenue) OVER (
PARTITION BY channel
ORDER BY DATE
) AS cumulative_revenue_channel
FROM channel_revenue
ORDER BY channel, DATE;PARTITION BY isolates cumulative per channel.
Cumulative MRR
Track total revenue accumulated:
WITH daily AS (
SELECT
DATE_TRUNC('day', created_at) AS day,
SUM(amount) AS daily_revenue
FROM transactions
WHERE status = 'paid'
GROUP BY 1
)
SELECT
day,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY day) AS lifetime_revenue,
-- Milestones
CASE
WHEN SUM(daily_revenue) OVER (ORDER BY day) >= 1000000 THEN 'past $1M'
WHEN SUM(daily_revenue) OVER (ORDER BY day) >= 100000 THEN 'past $100k'
WHEN SUM(daily_revenue) OVER (ORDER BY day) >= 10000 THEN 'past $10k'
ELSE 'starting'
END AS milestone
FROM daily;Cumulative distinct users
Tricky — COUNT(DISTINCT) не работает с window functions в большинстве БД. Workaround:
WITH first_seen AS (
SELECT
user_id,
MIN(created_at::DATE) AS first_date
FROM events
GROUP BY user_id
)
SELECT
DATE,
COUNT(*) FILTER (WHERE first_date = DATE) AS new_users,
COUNT(*) AS cumulative_users
FROM (
SELECT
d.DATE,
u.user_id,
u.first_date
FROM (
SELECT generate_series(
(SELECT MIN(first_date) FROM first_seen),
CURRENT_DATE,
'1 day'::INTERVAL
)::DATE AS DATE
) d
JOIN first_seen u ON u.first_date <= d.DATE
) joined
GROUP BY DATE
ORDER BY DATE;Cumulative percent (Pareto)
SELECT
customer_id,
revenue,
SUM(revenue) OVER () AS total,
SUM(revenue) OVER (ORDER BY revenue DESC) AS cum_revenue,
SUM(revenue) OVER (ORDER BY revenue DESC) * 100.0 / SUM(revenue) OVER () AS cum_pct
FROM customers
ORDER BY revenue DESC;Pareto: top 20% customers = 80% revenue (check cum_pct ≈ 80% at row #N).
Частые ошибки
Ошибка 1. ORDER BY забыли. SUM OVER без ORDER BY = grand total на every row. Wrong.
Ошибка 2. Cumulative DISTINCT. Большинство БД не поддерживают. Workaround нужен.
Ошибка 3. Missing dates. Daily revenue с gaps → cumulative тоже broken. Generate_series для full series.
Ошибка 4. Reset на partition? PARTITION BY automatically resets. Если ожидал continuous → no partition.
Ошибка 5. RANGE vs ROWS. RANGE BETWEEN UNBOUNDED PRECEDING может merge ties. ROWS — strict row count.
Связанные темы
- Как посчитать moving average в SQL
- Cumulative distinct на собесе DE
- Window functions advanced
- Как посчитать rolling sum в SQL
FAQ
SUM OVER без ORDER BY — что считает?
Grand total (same value во all rows). Not cumulative.
Cumulative DISTINCT работает?
В PostgreSQL — нет. Workaround через self-join или first_seen pattern.
UNBOUNDED PRECEDING зачем?
Explicit frame. Default same, но explicit clearer для readers.
Cumulative resets по group?
С PARTITION BY — yes. Без — global.
Cumulative для daily активных?
Cumulative DAU = total unique ever active. Different from running total of DAU.