Как посчитать API Latency в SQL

Закрепи формулу api latency в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать api latency в Telegram

Зачем API Latency

Latency — time от request до response. AVG latency lies (long tail hidden). Use перцентили: p50 (typical), p95 (slow users), p99 (outliers / SLO boundary).

Перцентили

Метрика Что показывает
p50 (median) Typical experience
p95 Slow 5% (engineering attention)
p99 Tail 1% (SLO boundary)
max Worst case (debug)

Базовый расчёт

SELECT
    DATE_TRUNC('hour', TIMESTAMP) AS hour,
    COUNT(*) AS requests,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY response_ms) AS p50_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95_ms,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY response_ms) AS p99_ms,
    MAX(response_ms) AS max_ms
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '24 hours'
GROUP BY 1
ORDER BY 1;

По endpoints

SELECT
    endpoint,
    COUNT(*) AS requests,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY response_ms) AS p50,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY response_ms) AS p99,
    COUNT(*) FILTER (WHERE response_ms > 1000) AS slow_requests
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '24 hours'
GROUP BY endpoint
ORDER BY p99 DESC
LIMIT 30;

Top p99 endpoints — engineering priority.

SLO compliance

SLO: «99% of requests < 500ms». Compliance:

WITH stats AS (
    SELECT
        DATE_TRUNC('day', TIMESTAMP) AS day,
        COUNT(*) AS total,
        COUNT(*) FILTER (WHERE response_ms < 500) AS fast_requests
    FROM api_logs
    WHERE TIMESTAMP >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY 1
)
SELECT
    day,
    total,
    fast_requests,
    fast_requests::NUMERIC * 100 / total AS slo_compliance_pct,
    CASE
        WHEN fast_requests::NUMERIC * 100 / total >= 99 THEN 'SLO MET'
        ELSE 'SLO MISSED'
    END AS status
FROM stats
ORDER BY day DESC;
Закрепи формулу api latency в Карьернике
Запомнить надолго — 5 коротких сессий с задачами на эту тему. Бесплатно
Тренировать api latency в Telegram

Slow query identification

SELECT
    endpoint,
    request_id,
    response_ms,
    TIMESTAMP,
    response_code
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '1 hour'
  AND response_ms > 2000  -- 2s+ requests
ORDER BY response_ms DESC
LIMIT 50;

Latency vs throughput

SELECT
    DATE_TRUNC('hour', TIMESTAMP) AS hour,
    COUNT(*) AS rpm,  -- requests per hour
    AVG(response_ms) AS avg_latency,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '24 hours'
GROUP BY 1
ORDER BY 1;

Latency increases при high throughput → capacity issue.

Частые ошибки

Ошибка 1. Avg latency. 50ms avg hides 5000ms p99. Use перцентили.

Ошибка 2. Server-side only. Server processed в 200ms. Client received после 800ms (network). Track e2e.

Ошибка 3. Sample bias. Logging 10% — percentiles unreliable. Sample более.

Ошибка 4. Outliers как «normal slow». p99 5000ms — нормально или нет? Compare to baseline.

Ошибка 5. Cold starts. Serverless cold start spikes p99. Track warm vs cold.

Связанные темы

FAQ

Какие percentiles track?

p50, p95, p99 — стандарт. Sometimes p99.9 для critical APIs.

AVG vs Median?

Median (p50) robust. AVG sensitive to outliers.

Какой p95 ok?

Internal API: < 100ms. Public API: < 500ms. Page load: < 2s.

Tail latency growing?

Investigate: 1) DB slow queries. 2) Cache misses. 3) GC pauses. 4) Network.

SLO budget?

99% SLO = 1% error budget. Once exhausted → freeze deploys, fix.