Как посчитать RMSE в SQL

Закрепи формулу rmse в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать rmse в Telegram

Зачем RMSE

RMSE = Root Mean Squared Error. Главная метрика regression моделей. Penalizes large errors more (squared). Same units as target (unlike MSE).

Формула

RMSE = sqrt(mean((predicted - actual)²))

Базовый расчёт

WITH errors AS (
    SELECT
        prediction_id,
        predicted_value,
        actual_value,
        predicted_value - actual_value AS error,
        POWER(predicted_value - actual_value, 2) AS squared_error
    FROM predictions
    WHERE prediction_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
    COUNT(*) AS predictions,
    AVG(error) AS mean_error,  -- bias
    AVG(squared_error) AS mse,
    SQRT(AVG(squared_error)) AS rmse,
    AVG(ABS(error)) AS mae
FROM errors;

RMSE vs MAE

Метрика Penalizes outliers Robust
MAE Linearly Yes
RMSE Quadratically (much more) No

Pick:

  • MAE — outliers expected, don't want to over-penalize
  • RMSE — large errors disproportionately bad
SELECT
    AVG(ABS(error)) AS mae,
    SQRT(AVG(POWER(error, 2))) AS rmse,
    SQRT(AVG(POWER(error, 2))) / NULLIF(AVG(ABS(error)), 0) AS rmse_mae_ratio
FROM (SELECT predicted_value - actual_value AS error FROM predictions) e;

Если ratio > 1.5 → outliers driving RMSE up.

По segments

SELECT
    segment,
    COUNT(*) AS predictions,
    SQRT(AVG(POWER(predicted_value - actual_value, 2))) AS rmse,
    AVG(ABS(predicted_value - actual_value)) AS mae,
    AVG(actual_value) AS avg_actual
FROM predictions
WHERE prediction_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY segment
ORDER BY rmse DESC;

High-error segments — model retraining / feature engineering.

Закрепи формулу rmse в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать rmse в Telegram

Normalized RMSE

For comparison across scales:

WITH stats AS (
    SELECT
        SQRT(AVG(POWER(predicted_value - actual_value, 2))) AS rmse,
        AVG(actual_value) AS avg_actual,
        MAX(actual_value) - MIN(actual_value) AS RANGE
    FROM predictions
)
SELECT
    rmse,
    rmse / NULLIF(avg_actual, 0) AS rmse_pct_of_mean,
    rmse / NULLIF(RANGE, 0) AS rmse_pct_of_range
FROM stats;

RMSE как % of mean — interpretable.

RMSE over time

SELECT
    DATE_TRUNC('week', prediction_date) AS week,
    COUNT(*) AS predictions,
    SQRT(AVG(POWER(predicted_value - actual_value, 2))) AS rmse
FROM predictions
WHERE prediction_date >= CURRENT_DATE - INTERVAL '12 weeks'
GROUP BY 1
ORDER BY 1;

RMSE растёт → concept drift, retrain.

Частые ошибки

Ошибка 1. RMSE без context. RMSE = 5. Good или bad? Depends on target scale.

Ошибка 2. Outliers dominate. 1 huge error inflates RMSE. MAE robust alternative.

Ошибка 3. Train RMSE vs Test RMSE. Optimization on train. Evaluation on test. Don't mix.

Ошибка 4. Different units. Compare RMSE для price ($) vs probability (0-1). Different ranges.

Ошибка 5. RMSE = SD? Only if predictions = mean (constant model). Else different.

Связанные темы

FAQ

RMSE units?

Same as target. Predicting prices в USD — RMSE в USD.

Какой RMSE acceptable?

Depends on target scale. RMSE 5 на price $50 — bad (10% error). На price $5000 — excellent.

MAE или RMSE?

MAE — robust. RMSE — outlier-sensitive. Both report.

RMSE > 0 — always?

Yes (RMSE always non-negative, only 0 если perfect predictions).

Train vs Test RMSE?

Train RMSE → overfitting risk. Test RMSE → real model performance.