Как посчитать медиану в SQL

Закрепи формулу median в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать median в Telegram

Зачем медиана

Медиана — значение в середине отсортированного ряда. Robust к outliers. Whales в revenue не двигают медиану. Среднее (mean) — двигают.

PERCENTILE_CONT

Continuous percentile — интерполирует между values:

SELECT
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount
FROM transactions
WHERE status = 'paid'
  AND created_at >= CURRENT_DATE - INTERVAL '30 days';
-- Sample: [1, 2, 3, 4]
-- PERCENTILE_CONT(0.5) = (2 + 3) / 2 = 2.5

PERCENTILE_DISC

Discrete — возвращает actual value (не интерполирует):

SELECT
    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount
FROM transactions
WHERE status = 'paid';
-- Sample: [1, 2, 3, 4]
-- PERCENTILE_DISC(0.5) = 2 (lower of two middles)

Use CONT для continuous values, DISC для categorical / discrete.

По группам

SELECT
    country,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS p25,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS p75,
    AVG(amount) AS mean_amount
FROM transactions
JOIN users USING (user_id)
WHERE status = 'paid'
  AND created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY country
ORDER BY median_amount DESC;
Закрепи формулу median в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать median в Telegram

Median vs Mean

WITH stats AS (
    SELECT
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_amt,
        AVG(amount) AS mean_amt,
        STDDEV(amount) AS std_amt,
        COUNT(*) AS n
    FROM transactions
    WHERE status = 'paid'
)
SELECT
    median_amt,
    mean_amt,
    mean_amt - median_amt AS skew_indicator,
    -- Positive skew → mean > median (whales)
    CASE
        WHEN mean_amt > median_amt * 1.5 THEN 'heavily RIGHT-skewed'
        WHEN mean_amt > median_amt * 1.1 THEN 'RIGHT-skewed'
        ELSE 'symmetric'
    END AS distribution
FROM stats;

MySQL / older databases

PERCENTILE_CONT не во всех. Workaround:

SELECT AVG(t.amount) AS median
FROM (
    SELECT amount, ROW_NUMBER() OVER (ORDER BY amount) AS rn,
           COUNT(*) OVER () AS total
    FROM transactions
    WHERE status = 'paid'
) t
WHERE t.rn IN (FLOOR((t.total + 1) / 2), CEIL((t.total + 1) / 2));

Частые ошибки

Ошибка 1. AVG вместо median для revenue. Whales (top 1%) могут pull AVG вверх 2x. Median robust.

Ошибка 2. PERCENTILE_CONT не in HAVING. В большинстве БД нельзя использовать в HAVING. Wrap в subquery.

Ошибка 3. Empty groups. Median over empty group = NULL.

Ошибка 4. Performance. PERCENTILE_CONT медленный на больших таблицах. Sample / approximate.

Ошибка 5. Median per row. Rolling median требует window functions + percentile (complex).

Связанные темы

FAQ

PERCENTILE_CONT vs PERCENTILE_DISC?

CONT интерполирует, DISC возвращает actual. CONT для continuous.

Median or Mean?

Right-skewed (revenue, salaries): median. Symmetric (heights, scores): mean.

PostgreSQL syntax?

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col). Window function syntax.

Median window function?

PERCENTILE_CONT не как window. Workaround: ROW_NUMBER + COUNT + filter.

Mode (most common value)?

SELECT MODE() WITHIN GROUP (ORDER BY col) FROM ...