Как посчитать медиану в SQL
Содержание:
Зачем медиана
Медиана — значение в середине отсортированного ряда. Robust к outliers. Whales в revenue не двигают медиану. Среднее (mean) — двигают.
PERCENTILE_CONT
Continuous percentile — интерполирует между values:
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount
FROM transactions
WHERE status = 'paid'
AND created_at >= CURRENT_DATE - INTERVAL '30 days';-- Sample: [1, 2, 3, 4]
-- PERCENTILE_CONT(0.5) = (2 + 3) / 2 = 2.5PERCENTILE_DISC
Discrete — возвращает actual value (не интерполирует):
SELECT
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount
FROM transactions
WHERE status = 'paid';-- Sample: [1, 2, 3, 4]
-- PERCENTILE_DISC(0.5) = 2 (lower of two middles)Use CONT для continuous values, DISC для categorical / discrete.
По группам
SELECT
country,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS p25,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS p75,
AVG(amount) AS mean_amount
FROM transactions
JOIN users USING (user_id)
WHERE status = 'paid'
AND created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY country
ORDER BY median_amount DESC;Median vs Mean
WITH stats AS (
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_amt,
AVG(amount) AS mean_amt,
STDDEV(amount) AS std_amt,
COUNT(*) AS n
FROM transactions
WHERE status = 'paid'
)
SELECT
median_amt,
mean_amt,
mean_amt - median_amt AS skew_indicator,
-- Positive skew → mean > median (whales)
CASE
WHEN mean_amt > median_amt * 1.5 THEN 'heavily RIGHT-skewed'
WHEN mean_amt > median_amt * 1.1 THEN 'RIGHT-skewed'
ELSE 'symmetric'
END AS distribution
FROM stats;MySQL / older databases
PERCENTILE_CONT не во всех. Workaround:
SELECT AVG(t.amount) AS median
FROM (
SELECT amount, ROW_NUMBER() OVER (ORDER BY amount) AS rn,
COUNT(*) OVER () AS total
FROM transactions
WHERE status = 'paid'
) t
WHERE t.rn IN (FLOOR((t.total + 1) / 2), CEIL((t.total + 1) / 2));Частые ошибки
Ошибка 1. AVG вместо median для revenue. Whales (top 1%) могут pull AVG вверх 2x. Median robust.
Ошибка 2. PERCENTILE_CONT не in HAVING. В большинстве БД нельзя использовать в HAVING. Wrap в subquery.
Ошибка 3. Empty groups. Median over empty group = NULL.
Ошибка 4. Performance. PERCENTILE_CONT медленный на больших таблицах. Sample / approximate.
Ошибка 5. Median per row. Rolling median требует window functions + percentile (complex).
Связанные темы
- Медиана vs среднее
- Как посчитать перцентили в SQL
- Как посчитать standard deviation в SQL
- Как проверить значимо ли среднее
FAQ
PERCENTILE_CONT vs PERCENTILE_DISC?
CONT интерполирует, DISC возвращает actual. CONT для continuous.
Median or Mean?
Right-skewed (revenue, salaries): median. Symmetric (heights, scores): mean.
PostgreSQL syntax?
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col). Window function syntax.
Median window function?
PERCENTILE_CONT не как window. Workaround: ROW_NUMBER + COUNT + filter.
Mode (most common value)?
SELECT MODE() WITHIN GROUP (ORDER BY col) FROM ...