Как посчитать F1 Score в SQL

Закрепи формулу f1 score в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать f1 score в Telegram

Зачем F1

F1 = harmonic mean precision и recall. Balances both. Useful when classes imbalanced. Accuracy lies при skewed data — F1 honest.

Формула

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2 × (Precision × Recall) / (Precision + Recall)

F1 range: 0-1. 1 = perfect classifier.

Базовый расчёт

WITH confusion AS (
    SELECT
        COUNT(*) FILTER (WHERE predicted = 1 AND actual = 1) AS tp,
        COUNT(*) FILTER (WHERE predicted = 1 AND actual = 0) AS fp,
        COUNT(*) FILTER (WHERE predicted = 0 AND actual = 1) AS fn,
        COUNT(*) FILTER (WHERE predicted = 0 AND actual = 0) AS tn
    FROM predictions
    WHERE prediction_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
    tp, fp, fn, tn,
    tp::NUMERIC / NULLIF(tp + fp, 0) AS precision,
    tp::NUMERIC / NULLIF(tp + fn, 0) AS recall,
    2.0 * tp / NULLIF(2 * tp + fp + fn, 0) AS f1
FROM confusion;

Macro vs Micro (multi-class)

Macro: average F1 per class. Micro: aggregate TP/FP/FN across classes.

WITH per_class AS (
    SELECT
        class_name,
        COUNT(*) FILTER (WHERE predicted = class_name AND actual = class_name) AS tp,
        COUNT(*) FILTER (WHERE predicted = class_name AND actual <> class_name) AS fp,
        COUNT(*) FILTER (WHERE predicted <> class_name AND actual = class_name) AS fn
    FROM predictions p
    CROSS JOIN (SELECT DISTINCT actual AS class_name FROM predictions) c
    GROUP BY class_name
),
per_class_f1 AS (
    SELECT
        class_name,
        2.0 * tp / NULLIF(2 * tp + fp + fn, 0) AS f1
    FROM per_class
)
SELECT
    AVG(f1) AS macro_f1,
    (SELECT 2.0 * SUM(tp) / NULLIF(SUM(2 * tp + fp + fn), 0) FROM per_class) AS micro_f1
FROM per_class_f1;
Закрепи формулу f1 score в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать f1 score в Telegram

По threshold

For probabilistic classifiers:

WITH thresholds AS (
    SELECT unnest(ARRAY[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]) AS threshold
),
f1_per_threshold AS (
    SELECT
        t.threshold,
        COUNT(*) FILTER (WHERE p.score >= t.threshold AND p.actual = 1) AS tp,
        COUNT(*) FILTER (WHERE p.score >= t.threshold AND p.actual = 0) AS fp,
        COUNT(*) FILTER (WHERE p.score < t.threshold AND p.actual = 1) AS fn
    FROM thresholds t
    CROSS JOIN predictions p
    GROUP BY t.threshold
)
SELECT
    threshold,
    tp, fp, fn,
    tp::NUMERIC / NULLIF(tp + fp, 0) AS precision,
    tp::NUMERIC / NULLIF(tp + fn, 0) AS recall,
    2.0 * tp / NULLIF(2 * tp + fp + fn, 0) AS f1
FROM f1_per_threshold
ORDER BY threshold;

Pick threshold maximizing F1.

F-beta variations

F1 weights precision и recall equally. F-beta tunes:

F-beta = (1 + β²) × precision × recall / (β² × precision + recall)
  • F0.5: precision matters more (β=0.5)
  • F1: balanced
  • F2: recall matters more (β=2)
-- F2 example (recall priority)
SELECT
    5.0 * precision * recall / NULLIF(4 * precision + recall, 0) AS f2
FROM ...

Частые ошибки

Ошибка 1. F1 = Accuracy assumption. F1 differs especially с imbalanced classes.

Ошибка 2. F1 ignores TN. TN не в formula. Negative class performance hidden.

Ошибка 3. Threshold = 0.5 default. Optimize threshold maximizing F1. 0.5 — convention, not optimal.

Ошибка 4. Multi-class confusion. Per-class F1 vs macro vs micro — different stories.

Ошибка 5. F1 for regression. F1 — classification. Use RMSE / MAE / R² для regression.

Связанные темы

FAQ

F1 range?

0-1. 0 — terrible. 1 — perfect.

Какой F1 ok?

Depends. Spam: 0.95+. Fraud detection: 0.6-0.8. Medical diagnosis: 0.8+.

F1 vs Accuracy?

Imbalanced classes — F1 reflects reality. Accuracy misleading.

Macro vs Micro F1?

Macro: each class equal weight. Micro: data-point equal weight. Macro shows hidden class bias.

F1 ignores negative class — что делать?

Use balanced accuracy or AUC если negative class important.