Как посчитать Trimmed Mean в SQL

Закрепи формулу trimmed mean в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать trimmed mean в Telegram

Зачем Trimmed Mean

Average sensitive к outliers. Median robust но теряет sensitivity. Trimmed Mean — compromise: drop top/bottom X%, average оставшихся. Часто используется в A/B-тестах с heavy-tailed metrics (revenue).

Формула

Trimmed Mean (p%) = AVG of middle (100 - 2p)% values

Standard: 5% trimmed (drop top 5%, bottom 5%, average remaining 90%).

Базовый расчёт

5% trimmed mean:

WITH bounds AS (
    SELECT
        PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) AS p5,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) AS p95
    FROM transactions
    WHERE status = 'paid'
)
SELECT
    AVG(t.amount) AS trimmed_mean_5pct
FROM transactions t, bounds b
WHERE t.status = 'paid'
  AND t.amount BETWEEN b.p5 AND b.p95;

10% trimmed (more aggressive):

WITH bounds AS (
    SELECT
        PERCENTILE_CONT(0.10) WITHIN GROUP (ORDER BY amount) AS p10,
        PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY amount) AS p90
    FROM transactions
)
SELECT AVG(amount) AS trimmed_mean_10pct
FROM transactions t, bounds b
WHERE amount BETWEEN b.p10 AND b.p90;

Winsorized Mean

Альтернатива: replace outliers с p5/p95 values (не drop):

WITH bounds AS (
    SELECT
        PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) AS p5,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) AS p95
    FROM transactions
)
SELECT
    AVG(
        CASE
            WHEN t.amount < b.p5 THEN b.p5
            WHEN t.amount > b.p95 THEN b.p95
            ELSE t.amount
        END
    ) AS winsorized_mean
FROM transactions t, bounds b;

Winsorize preserves sample size. Trim reduces.

По группам

WITH bounds AS (
    SELECT
        country,
        PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) AS p5,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) AS p95
    FROM transactions
    JOIN users USING (user_id)
    GROUP BY country
)
SELECT
    t.country,
    AVG(t.amount) FILTER (WHERE t.amount BETWEEN b.p5 AND b.p95) AS trimmed_mean,
    AVG(t.amount) AS raw_mean
FROM transactions t
JOIN users u USING (user_id)
JOIN bounds b ON b.country = u.country
WHERE t.status = 'paid'
GROUP BY t.country;
Закрепи формулу trimmed mean в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать trimmed mean в Telegram

Trimmed vs Median

WITH t AS (
    SELECT
        AVG(amount) AS mean,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
        -- 5% trimmed
        AVG(amount) FILTER (
            WHERE amount BETWEEN PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) OVER ()
                            AND PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) OVER ()
        ) AS trimmed
    FROM transactions
)
SELECT * FROM t;

(syntax can vary by dialect — adjust)

Метрика Robustness Sensitivity
Mean Low High
Median Highest Low
Trimmed (5%) High Medium-high

A/B test usage

В A/B-тестах часто использовать trimmed mean для revenue:

WITH bounds AS (
    SELECT PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY revenue) AS p99
    FROM ab_test_data
)
SELECT
    variant,
    AVG(revenue) AS raw_arpu,
    AVG(LEAST(revenue, b.p99)) AS winsorized_arpu  -- 99% cap
FROM ab_test_data t, bounds b
GROUP BY variant;

99% winsorization — стандарт в Netflix / Google.

Частые ошибки

Ошибка 1. Wrong trim %. 5% trim — типичный. 50% trim = median. Don't over-trim.

Ошибка 2. Trim per group. Different groups → different cuts? Per-group bounds may be needed.

Ошибка 3. Negative values trim. For revenue (always positive) — only top trim makes sense.

Ошибка 4. Bias introduction. Heavy-tail informative. Trimming hides signal.

Ошибка 5. Inconsistent в A/B. Trim A and B at same percentiles (pooled, not separately).

Связанные темы

FAQ

Какой trim %?

5% — moderate. 10% — aggressive. 1% — only extreme outliers.

Trimmed vs Winsorized?

Trimmed drops. Winsorized caps. Winsorized preserves N.

Use в A/B?

Часто. Revenue / count of actions, heavy-tailed. Trimmed mean — robust effect estimate.

Per-group или pooled trim?

Pooled (combined sample). Per-group → bias.

Trim 50% = median?

Theoretically. Practically median computed differently. Equivalent для symmetric distributions.