Как посчитать IQR в SQL
Содержание:
Зачем IQR
IQR (Interquartile Range) = Q3 - Q1. Робастная мера spread. Не зависит от outliers. Для skewed distributions (revenue, salaries) — лучше SD.
Формула
Q1 = 25th percentile
Q3 = 75th percentile
IQR = Q3 - Q150% values лежат между Q1 и Q3.
Базовый расчёт
WITH stats AS (
SELECT
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY amount) AS median,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3,
AVG(amount) AS mean,
STDDEV(amount) AS sd
FROM transactions
WHERE status = 'paid'
AND created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
q1,
median,
q3,
q3 - q1 AS iqr,
mean,
sd
FROM stats;Outlier detection
Tukey's fence: outlier если > Q3 + 1.5 × IQR или < Q1 - 1.5 × IQR.
WITH bounds AS (
SELECT
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3
FROM transactions
WHERE status = 'paid'
),
limits AS (
SELECT
q1,
q3,
q3 - q1 AS iqr,
q1 - 1.5 * (q3 - q1) AS lower_fence,
q3 + 1.5 * (q3 - q1) AS upper_fence
FROM bounds
)
SELECT
t.transaction_id,
t.user_id,
t.amount,
l.lower_fence,
l.upper_fence,
CASE
WHEN t.amount > l.upper_fence THEN 'high outlier'
WHEN t.amount < l.lower_fence THEN 'low outlier'
ELSE 'normal'
END AS outlier_status
FROM transactions t, limits l
WHERE t.status = 'paid'
ORDER BY t.amount DESC
LIMIT 100;По группам
SELECT
country,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS q1,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY amount) AS median,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) AS q3,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount)
- PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS iqr
FROM transactions
JOIN users USING (user_id)
WHERE status = 'paid'
GROUP BY country
ORDER BY iqr DESC;High IQR = wide spread в country.
IQR vs SD
SELECT
COUNT(*) AS n,
AVG(amount) AS mean,
STDDEV(amount) AS sd,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount)
- PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) AS iqr
FROM transactions;- SD assumes ~normal. Outliers heavily skew.
- IQR robust. Real-world skewed data — IQR более informative.
For normal: SD ≈ IQR / 1.35.
Частые ошибки
Ошибка 1. 1.5 × IQR threshold rigid. Tukey's 1.5 — convention. 3 × IQR для extreme outliers. Adjust по контексту.
Ошибка 2. IQR на binary data. Binary (0/1) — IQR meaningless.
Ошибка 3. Small N. N < 10 — percentiles unreliable. IQR too.
Ошибка 4. Outlier ≠ error. Whale 100x median = legitimate outlier. Don't auto-delete.
Ошибка 5. IQR on cumulative metrics. LTV cumulative → grows with tenure. IQR per tenure cohort.
Связанные темы
- Как посчитать standard deviation в SQL
- Как посчитать перцентили в SQL
- Как посчитать медиану в SQL
- Как посчитать anomaly detection в SQL
FAQ
IQR vs SD?
IQR robust. SD assumes normal. For skewed: IQR.
1.5 × IQR origin?
Tukey's convention. Catches 99.3% data в normal distribution.
IQR для не-normal data?
Yes! Главная фишка IQR — robustness к non-normality.
IQR в Python?
import numpy as np
iqr = np.percentile(data, 75) - np.percentile(data, 25)Box plot?
Visual IQR. Box = Q1-Q3. Whiskers = 1.5×IQR fences. Dots = outliers.