Как посчитать IQR в SQL

Закрепи формулу iqr в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать iqr в Telegram

Зачем IQR

IQR (Interquartile Range) = Q3 - Q1. Робастная мера spread. Не зависит от outliers. Для skewed distributions (revenue, salaries) — лучше SD.

Формула

Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q1

50% values лежат между Q1 и Q3.

Базовый расчёт

WITH stats AS (
    SELECT
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
        PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY amount) AS median,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3,
        AVG(amount) AS mean,
        STDDEV(amount) AS sd
    FROM transactions
    WHERE status = 'paid'
      AND created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
    q1,
    median,
    q3,
    q3 - q1 AS iqr,
    mean,
    sd
FROM stats;

Outlier detection

Tukey's fence: outlier если > Q3 + 1.5 × IQR или < Q1 - 1.5 × IQR.

WITH bounds AS (
    SELECT
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3
    FROM transactions
    WHERE status = 'paid'
),
limits AS (
    SELECT
        q1,
        q3,
        q3 - q1 AS iqr,
        q1 - 1.5 * (q3 - q1) AS lower_fence,
        q3 + 1.5 * (q3 - q1) AS upper_fence
    FROM bounds
)
SELECT
    t.transaction_id,
    t.user_id,
    t.amount,
    l.lower_fence,
    l.upper_fence,
    CASE
        WHEN t.amount > l.upper_fence THEN 'high outlier'
        WHEN t.amount < l.lower_fence THEN 'low outlier'
        ELSE 'normal'
    END AS outlier_status
FROM transactions t, limits l
WHERE t.status = 'paid'
ORDER BY t.amount DESC
LIMIT 100;
Закрепи формулу iqr в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать iqr в Telegram

По группам

SELECT
    country,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY amount) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount)
    - PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS iqr
FROM transactions
JOIN users USING (user_id)
WHERE status = 'paid'
GROUP BY country
ORDER BY iqr DESC;

High IQR = wide spread в country.

IQR vs SD

SELECT
    COUNT(*) AS n,
    AVG(amount) AS mean,
    STDDEV(amount) AS sd,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount)
    - PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS iqr
FROM transactions;
  • SD assumes ~normal. Outliers heavily skew.
  • IQR robust. Real-world skewed data — IQR более informative.

For normal: SD ≈ IQR / 1.35.

Частые ошибки

Ошибка 1. 1.5 × IQR threshold rigid. Tukey's 1.5 — convention. 3 × IQR для extreme outliers. Adjust по контексту.

Ошибка 2. IQR на binary data. Binary (0/1) — IQR meaningless.

Ошибка 3. Small N. N < 10 — percentiles unreliable. IQR too.

Ошибка 4. Outlier ≠ error. Whale 100x median = legitimate outlier. Don't auto-delete.

Ошибка 5. IQR on cumulative metrics. LTV cumulative → grows with tenure. IQR per tenure cohort.

Связанные темы

FAQ

IQR vs SD?

IQR robust. SD assumes normal. For skewed: IQR.

1.5 × IQR origin?

Tukey's convention. Catches 99.3% data в normal distribution.

IQR для не-normal data?

Yes! Главная фишка IQR — robustness к non-normality.

IQR в Python?

import numpy as np
iqr = np.percentile(data, 75) - np.percentile(data, 25)

Box plot?

Visual IQR. Box = Q1-Q3. Whiskers = 1.5×IQR fences. Dots = outliers.