Как посчитать availability в SQL

Закрепи формулу availability uptime в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать availability uptime в Telegram

Зачем availability

Availability (uptime) — доля времени, когда сервис «жив» (не downtime). Главный SLI. 99.9% = 8.76 hours downtime per year. 99.99% = 52 minutes. 99.999% («five nines») = 5 minutes. Контракты enterprise часто требуют 99.95%.

Формула

Time-based:

availability = uptime / (uptime + downtime)

Request-based:

availability = successful_requests / total_requests

Request-based точнее (учитывает partial outage).

Availability в SQL

Через probes:

SELECT
    COUNT(*) AS total_probes,
    COUNT(*) FILTER (WHERE status = 'success') AS successful,
    COUNT(*) FILTER (WHERE status = 'success')::NUMERIC * 100
    / NULLIF(COUNT(*), 0) AS availability_pct
FROM health_probes
WHERE probe_timestamp >= NOW() - INTERVAL '30 days';

99.95% — стандарт. 99.99% — premium.

Error budget

При 99.9% SLO budget = 0.1% downtime allowed:

WITH config AS (
    SELECT 0.999 AS slo, 30 AS days_in_period
),
period_stats AS (
    SELECT
        COUNT(*) AS total,
        COUNT(*) FILTER (WHERE status = 'success') AS successful
    FROM health_probes
    WHERE probe_timestamp >= NOW() - INTERVAL '30 days'
)
SELECT
    successful::NUMERIC / total AS availability,
    config.slo AS slo_target,
    (1 - config.slo) * total AS budget_total,
    total - successful AS budget_used,
    ((1 - config.slo) * total) - (total - successful) AS budget_remaining,
    (((1 - config.slo) * total) - (total - successful))::NUMERIC * 100
    / NULLIF((1 - config.slo) * total, 0) AS budget_remaining_pct
FROM period_stats, config;

Budget remaining 60%+ — safe. 0%+ — frozen deploys.

Закрепи формулу availability uptime в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать availability uptime в Telegram

SLO targets

SLO Allowed downtime / month
99.9% 43.2 min
99.95% 21.6 min
99.99% 4.3 min
99.999% 25.9 sec
SELECT
    slo,
    (1 - slo) * 30 * 24 * 60 AS allowed_downtime_min_per_month
FROM (VALUES (0.999), (0.9995), (0.9999), (0.99999)) AS t(slo);

Частые ошибки

Ошибка 1. Time-based ignored partial outages. Service «up», но 50% requests fail. Request-based better.

Ошибка 2. Считать planned maintenance. Customer всё равно видит outage. Включайте в downtime (стандарт).

Ошибка 3. Probe из одной локации. Single probe location скрывает регионы. Multi-region probes.

Ошибка 4. Без excluding test traffic. Synthetic probes inflate request count. Track real user requests тоже.

Ошибка 5. SLO без error budget. SLO без budget management — wishful target. Track budget consumption.

Связанные темы

FAQ

99.9% или 99.99%?

99.9% — обычно достаточно для most B2B. 99.99% — finance, payments, healthcare.

Time vs request-based?

Request-based для customer-facing. Time-based для infrastructure.

Error budget consumed — что делать?

Freeze risky deploys, focus на stability work.

Multi-region?

Probes из 3+ regions. Average / worst.

Включать DNS / CDN?

Yes — end-to-end customer experience. Не только origin.