Как посчитать RMSE в SQL
Содержание:
Зачем RMSE
RMSE = Root Mean Squared Error. Главная метрика regression моделей. Penalizes large errors more (squared). Same units as target (unlike MSE).
Формула
RMSE = sqrt(mean((predicted - actual)²))Базовый расчёт
WITH errors AS (
SELECT
prediction_id,
predicted_value,
actual_value,
predicted_value - actual_value AS error,
POWER(predicted_value - actual_value, 2) AS squared_error
FROM predictions
WHERE prediction_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
COUNT(*) AS predictions,
AVG(error) AS mean_error, -- bias
AVG(squared_error) AS mse,
SQRT(AVG(squared_error)) AS rmse,
AVG(ABS(error)) AS mae
FROM errors;RMSE vs MAE
| Метрика | Penalizes outliers | Robust |
|---|---|---|
| MAE | Linearly | Yes |
| RMSE | Quadratically (much more) | No |
Pick:
- MAE — outliers expected, don't want to over-penalize
- RMSE — large errors disproportionately bad
SELECT
AVG(ABS(error)) AS mae,
SQRT(AVG(POWER(error, 2))) AS rmse,
SQRT(AVG(POWER(error, 2))) / NULLIF(AVG(ABS(error)), 0) AS rmse_mae_ratio
FROM (SELECT predicted_value - actual_value AS error FROM predictions) e;Если ratio > 1.5 → outliers driving RMSE up.
По segments
SELECT
segment,
COUNT(*) AS predictions,
SQRT(AVG(POWER(predicted_value - actual_value, 2))) AS rmse,
AVG(ABS(predicted_value - actual_value)) AS mae,
AVG(actual_value) AS avg_actual
FROM predictions
WHERE prediction_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY segment
ORDER BY rmse DESC;High-error segments — model retraining / feature engineering.
Normalized RMSE
For comparison across scales:
WITH stats AS (
SELECT
SQRT(AVG(POWER(predicted_value - actual_value, 2))) AS rmse,
AVG(actual_value) AS avg_actual,
MAX(actual_value) - MIN(actual_value) AS RANGE
FROM predictions
)
SELECT
rmse,
rmse / NULLIF(avg_actual, 0) AS rmse_pct_of_mean,
rmse / NULLIF(RANGE, 0) AS rmse_pct_of_range
FROM stats;RMSE как % of mean — interpretable.
RMSE over time
SELECT
DATE_TRUNC('week', prediction_date) AS week,
COUNT(*) AS predictions,
SQRT(AVG(POWER(predicted_value - actual_value, 2))) AS rmse
FROM predictions
WHERE prediction_date >= CURRENT_DATE - INTERVAL '12 weeks'
GROUP BY 1
ORDER BY 1;RMSE растёт → concept drift, retrain.
Частые ошибки
Ошибка 1. RMSE без context. RMSE = 5. Good или bad? Depends on target scale.
Ошибка 2. Outliers dominate. 1 huge error inflates RMSE. MAE robust alternative.
Ошибка 3. Train RMSE vs Test RMSE. Optimization on train. Evaluation on test. Don't mix.
Ошибка 4. Different units. Compare RMSE для price ($) vs probability (0-1). Different ranges.
Ошибка 5. RMSE = SD? Only if predictions = mean (constant model). Else different.
Связанные темы
- Как посчитать MAE в SQL
- Как посчитать MAPE в SQL
- Как посчитать R² в SQL
- Linear regression для аналитика
FAQ
RMSE units?
Same as target. Predicting prices в USD — RMSE в USD.
Какой RMSE acceptable?
Depends on target scale. RMSE 5 на price $50 — bad (10% error). На price $5000 — excellent.
MAE или RMSE?
MAE — robust. RMSE — outlier-sensitive. Both report.
RMSE > 0 — always?
Yes (RMSE always non-negative, only 0 если perfect predictions).
Train vs Test RMSE?
Train RMSE → overfitting risk. Test RMSE → real model performance.