Как посчитать Skewness в SQL

Закрепи формулу skewness в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать skewness в Telegram

Зачем Skewness

Skewness measures асимметрию distribution. Right skew (positive) — long tail справа (revenue, salaries). Left skew (negative) — long tail слева (test scores близко к max). Skewness = 0 — symmetric (normal).

Формула

Sample skewness:

γ = (1/n) × Σ((xi - x̄) / s)³

Pearson's median skewness (simpler):

γ = 3 × (mean - median) / sd

Pearson skewness (простая)

WITH stats AS (
    SELECT
        AVG(amount) AS mean,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
        STDDEV(amount) AS sd
    FROM transactions
    WHERE status = 'paid'
)
SELECT
    mean,
    median,
    sd,
    3.0 * (mean - median) / NULLIF(sd, 0) AS pearson_skew
FROM stats;

Sample skewness

WITH stats AS (
    SELECT
        AVG(amount) AS mean,
        STDDEV(amount) AS sd,
        COUNT(*) AS n
    FROM transactions
    WHERE status = 'paid'
),
moments AS (
    SELECT
        AVG(POWER((amount - s.mean) / NULLIF(s.sd, 0), 3)) AS skew
    FROM transactions t, stats s
    WHERE t.status = 'paid'
)
SELECT skew FROM moments;

Интерпретация

Skew Interpretation
0 Symmetric (normal)
> 0 Right-skewed (long right tail)
< 0 Left-skewed (long left tail)
> 1 Highly skewed
> 2 Extremely skewed
-- Decision rule
SELECT
    mean,
    median,
    3.0 * (mean - median) / NULLIF(sd, 0) AS skew,
    CASE
        WHEN ABS(3.0 * (mean - median) / NULLIF(sd, 0)) < 0.5 THEN 'symmetric'
        WHEN 3.0 * (mean - median) / NULLIF(sd, 0) > 1 THEN 'heavily RIGHT-skewed (whales)'
        WHEN 3.0 * (mean - median) / NULLIF(sd, 0) > 0.5 THEN 'RIGHT-skewed'
        WHEN 3.0 * (mean - median) / NULLIF(sd, 0) < -1 THEN 'heavily LEFT-skewed'
        ELSE 'LEFT-skewed'
    END AS shape
FROM stats;
Закрепи формулу skewness в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать skewness в Telegram

По группам

SELECT
    country,
    AVG(amount) AS mean,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
    STDDEV(amount) AS sd,
    3.0 * (AVG(amount) - PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount)) / NULLIF(STDDEV(amount), 0) AS skew
FROM transactions
JOIN users USING (user_id)
WHERE status = 'paid'
GROUP BY country
ORDER BY skew DESC;

Why care?

  • Mean vs Median: Right-skew → mean > median. Report median.
  • A/B tests: Heavy skew → use non-parametric / bootstrap, not t-test.
  • Outliers: Skew indicates outliers presence.
  • Transformations: log(x) often normalize right-skewed data.

Частые ошибки

Ошибка 1. Skew = 0 = normal. Skew = 0 — symmetric. Не гарантирует normality (kurtosis тоже).

Ошибка 2. Sample size matters. N < 50 — skew estimate unreliable.

Ошибка 3. Pearson vs sample formula. Different formulas. Pearson — simpler approximation.

Ошибка 4. Log-transform на 0/negative. Log requires > 0. Add constant первый: log(x + 1).

Ошибка 5. SD undefined. Pearson skew = (mean - median) / sd. Sd = 0 → undefined.

Связанные темы

FAQ

Skewness vs Kurtosis?

Skew — asymmetry. Kurtosis — tail heaviness (peaked vs flat).

Negative skew когда?

Test scores где max каpped (100% — test). Tail слева.

Skew > 1 — что значит?

Heavy right tail. Use median вместо mean.

Log transform?

Right-skewed (positive only) → log сглаживает. Normality test после.

Bootstrap для skewed?

Yes. Non-parametric, не assumes normality. CI на median through bootstrap.