Как посчитать availability в SQL
Содержание:
Зачем availability
Availability (uptime) — доля времени, когда сервис «жив» (не downtime). Главный SLI. 99.9% = 8.76 hours downtime per year. 99.99% = 52 minutes. 99.999% («five nines») = 5 minutes. Контракты enterprise часто требуют 99.95%.
Формула
Time-based:
availability = uptime / (uptime + downtime)Request-based:
availability = successful_requests / total_requestsRequest-based точнее (учитывает partial outage).
Availability в SQL
Через probes:
SELECT
COUNT(*) AS total_probes,
COUNT(*) FILTER (WHERE status = 'success') AS successful,
COUNT(*) FILTER (WHERE status = 'success')::NUMERIC * 100
/ NULLIF(COUNT(*), 0) AS availability_pct
FROM health_probes
WHERE probe_timestamp >= NOW() - INTERVAL '30 days';99.95% — стандарт. 99.99% — premium.
Error budget
При 99.9% SLO budget = 0.1% downtime allowed:
WITH config AS (
SELECT 0.999 AS slo, 30 AS days_in_period
),
period_stats AS (
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = 'success') AS successful
FROM health_probes
WHERE probe_timestamp >= NOW() - INTERVAL '30 days'
)
SELECT
successful::NUMERIC / total AS availability,
config.slo AS slo_target,
(1 - config.slo) * total AS budget_total,
total - successful AS budget_used,
((1 - config.slo) * total) - (total - successful) AS budget_remaining,
(((1 - config.slo) * total) - (total - successful))::NUMERIC * 100
/ NULLIF((1 - config.slo) * total, 0) AS budget_remaining_pct
FROM period_stats, config;Budget remaining 60%+ — safe. 0%+ — frozen deploys.
SLO targets
| SLO | Allowed downtime / month |
|---|---|
| 99.9% | 43.2 min |
| 99.95% | 21.6 min |
| 99.99% | 4.3 min |
| 99.999% | 25.9 sec |
SELECT
slo,
(1 - slo) * 30 * 24 * 60 AS allowed_downtime_min_per_month
FROM (VALUES (0.999), (0.9995), (0.9999), (0.99999)) AS t(slo);Частые ошибки
Ошибка 1. Time-based ignored partial outages. Service «up», но 50% requests fail. Request-based better.
Ошибка 2. Считать planned maintenance. Customer всё равно видит outage. Включайте в downtime (стандарт).
Ошибка 3. Probe из одной локации. Single probe location скрывает регионы. Multi-region probes.
Ошибка 4. Без excluding test traffic. Synthetic probes inflate request count. Track real user requests тоже.
Ошибка 5. SLO без error budget. SLO без budget management — wishful target. Track budget consumption.
Связанные темы
- Как посчитать error rate в SQL
- Как посчитать API latency в SQL
- Как посчитать Apdex score в SQL
- Как посчитать latency percentiles в SQL
FAQ
99.9% или 99.99%?
99.9% — обычно достаточно для most B2B. 99.99% — finance, payments, healthcare.
Time vs request-based?
Request-based для customer-facing. Time-based для infrastructure.
Error budget consumed — что делать?
Freeze risky deploys, focus на stability work.
Multi-region?
Probes из 3+ regions. Average / worst.
Включать DNS / CDN?
Yes — end-to-end customer experience. Не только origin.