Как посчитать Trimmed Mean в SQL
Содержание:
Зачем Trimmed Mean
Average sensitive к outliers. Median robust но теряет sensitivity. Trimmed Mean — compromise: drop top/bottom X%, average оставшихся. Часто используется в A/B-тестах с heavy-tailed metrics (revenue).
Формула
Trimmed Mean (p%) = AVG of middle (100 - 2p)% valuesStandard: 5% trimmed (drop top 5%, bottom 5%, average remaining 90%).
Базовый расчёт
5% trimmed mean:
WITH bounds AS (
SELECT
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) AS p5,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) AS p95
FROM transactions
WHERE status = 'paid'
)
SELECT
AVG(t.amount) AS trimmed_mean_5pct
FROM transactions t, bounds b
WHERE t.status = 'paid'
AND t.amount BETWEEN b.p5 AND b.p95;10% trimmed (more aggressive):
WITH bounds AS (
SELECT
PERCENTILE_CONT(0.10) WITHIN GROUP (ORDER BY amount) AS p10,
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY amount) AS p90
FROM transactions
)
SELECT AVG(amount) AS trimmed_mean_10pct
FROM transactions t, bounds b
WHERE amount BETWEEN b.p10 AND b.p90;Winsorized Mean
Альтернатива: replace outliers с p5/p95 values (не drop):
WITH bounds AS (
SELECT
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) AS p5,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) AS p95
FROM transactions
)
SELECT
AVG(
CASE
WHEN t.amount < b.p5 THEN b.p5
WHEN t.amount > b.p95 THEN b.p95
ELSE t.amount
END
) AS winsorized_mean
FROM transactions t, bounds b;Winsorize preserves sample size. Trim reduces.
По группам
WITH bounds AS (
SELECT
country,
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) AS p5,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) AS p95
FROM transactions
JOIN users USING (user_id)
GROUP BY country
)
SELECT
t.country,
AVG(t.amount) FILTER (WHERE t.amount BETWEEN b.p5 AND b.p95) AS trimmed_mean,
AVG(t.amount) AS raw_mean
FROM transactions t
JOIN users u USING (user_id)
JOIN bounds b ON b.country = u.country
WHERE t.status = 'paid'
GROUP BY t.country;Trimmed vs Median
WITH t AS (
SELECT
AVG(amount) AS mean,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
-- 5% trimmed
AVG(amount) FILTER (
WHERE amount BETWEEN PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY amount) OVER ()
AND PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) OVER ()
) AS trimmed
FROM transactions
)
SELECT * FROM t;(syntax can vary by dialect — adjust)
| Метрика | Robustness | Sensitivity |
|---|---|---|
| Mean | Low | High |
| Median | Highest | Low |
| Trimmed (5%) | High | Medium-high |
A/B test usage
В A/B-тестах часто использовать trimmed mean для revenue:
WITH bounds AS (
SELECT PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY revenue) AS p99
FROM ab_test_data
)
SELECT
variant,
AVG(revenue) AS raw_arpu,
AVG(LEAST(revenue, b.p99)) AS winsorized_arpu -- 99% cap
FROM ab_test_data t, bounds b
GROUP BY variant;99% winsorization — стандарт в Netflix / Google.
Частые ошибки
Ошибка 1. Wrong trim %. 5% trim — типичный. 50% trim = median. Don't over-trim.
Ошибка 2. Trim per group. Different groups → different cuts? Per-group bounds may be needed.
Ошибка 3. Negative values trim. For revenue (always positive) — only top trim makes sense.
Ошибка 4. Bias introduction. Heavy-tail informative. Trimming hides signal.
Ошибка 5. Inconsistent в A/B. Trim A and B at same percentiles (pooled, not separately).
Связанные темы
- Как посчитать медиану в SQL
- Как посчитать IQR в SQL
- Как посчитать перцентили в SQL
- Как посчитать standard deviation в SQL
FAQ
Какой trim %?
5% — moderate. 10% — aggressive. 1% — only extreme outliers.
Trimmed vs Winsorized?
Trimmed drops. Winsorized caps. Winsorized preserves N.
Use в A/B?
Часто. Revenue / count of actions, heavy-tailed. Trimmed mean — robust effect estimate.
Per-group или pooled trim?
Pooled (combined sample). Per-group → bias.
Trim 50% = median?
Theoretically. Practically median computed differently. Equivalent для symmetric distributions.