Как посчитать API Latency в SQL
Содержание:
Зачем API Latency
Latency — time от request до response. AVG latency lies (long tail hidden). Use перцентили: p50 (typical), p95 (slow users), p99 (outliers / SLO boundary).
Перцентили
| Метрика | Что показывает |
|---|---|
| p50 (median) | Typical experience |
| p95 | Slow 5% (engineering attention) |
| p99 | Tail 1% (SLO boundary) |
| max | Worst case (debug) |
Базовый расчёт
SELECT
DATE_TRUNC('hour', TIMESTAMP) AS hour,
COUNT(*) AS requests,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY response_ms) AS p50_ms,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95_ms,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY response_ms) AS p99_ms,
MAX(response_ms) AS max_ms
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '24 hours'
GROUP BY 1
ORDER BY 1;По endpoints
SELECT
endpoint,
COUNT(*) AS requests,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY response_ms) AS p50,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY response_ms) AS p99,
COUNT(*) FILTER (WHERE response_ms > 1000) AS slow_requests
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '24 hours'
GROUP BY endpoint
ORDER BY p99 DESC
LIMIT 30;Top p99 endpoints — engineering priority.
SLO compliance
SLO: «99% of requests < 500ms». Compliance:
WITH stats AS (
SELECT
DATE_TRUNC('day', TIMESTAMP) AS day,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE response_ms < 500) AS fast_requests
FROM api_logs
WHERE TIMESTAMP >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1
)
SELECT
day,
total,
fast_requests,
fast_requests::NUMERIC * 100 / total AS slo_compliance_pct,
CASE
WHEN fast_requests::NUMERIC * 100 / total >= 99 THEN 'SLO MET'
ELSE 'SLO MISSED'
END AS status
FROM stats
ORDER BY day DESC;Slow query identification
SELECT
endpoint,
request_id,
response_ms,
TIMESTAMP,
response_code
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '1 hour'
AND response_ms > 2000 -- 2s+ requests
ORDER BY response_ms DESC
LIMIT 50;Latency vs throughput
SELECT
DATE_TRUNC('hour', TIMESTAMP) AS hour,
COUNT(*) AS rpm, -- requests per hour
AVG(response_ms) AS avg_latency,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95
FROM api_logs
WHERE TIMESTAMP >= NOW() - INTERVAL '24 hours'
GROUP BY 1
ORDER BY 1;Latency increases при high throughput → capacity issue.
Частые ошибки
Ошибка 1. Avg latency. 50ms avg hides 5000ms p99. Use перцентили.
Ошибка 2. Server-side only. Server processed в 200ms. Client received после 800ms (network). Track e2e.
Ошибка 3. Sample bias. Logging 10% — percentiles unreliable. Sample более.
Ошибка 4. Outliers как «normal slow». p99 5000ms — нормально или нет? Compare to baseline.
Ошибка 5. Cold starts. Serverless cold start spikes p99. Track warm vs cold.
Связанные темы
- Как посчитать перцентили в SQL
- Как посчитать error rate в SQL
- SLA / SLO / SLI на собесе SA
- Incident response на собесе SA
FAQ
Какие percentiles track?
p50, p95, p99 — стандарт. Sometimes p99.9 для critical APIs.
AVG vs Median?
Median (p50) robust. AVG sensitive to outliers.
Какой p95 ok?
Internal API: < 100ms. Public API: < 500ms. Page load: < 2s.
Tail latency growing?
Investigate: 1) DB slow queries. 2) Cache misses. 3) GC pauses. 4) Network.
SLO budget?
99% SLO = 1% error budget. Once exhausted → freeze deploys, fix.