Chaos engineering на собеседовании системного аналитика
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Идея
Netflix 2010+. Inject failures controlled — find weakness before customers experience.
«Chaos engineering doesn't cause chaos — it reveals chaos that's already there.»
Principles
- Build hypothesis вокруг steady state. Define normal.
- Vary real-world events. Network, server, deps fail.
- Run experiments в production (если возможно).
- Automate experiments.
- Minimize blast radius.
Fault injection types
Resource exhaustion. CPU spike, memory leak, disk full.
Network. Latency, packet loss, DNS failures, partition.
Service failure. Shut down instances, slow responses.
Region failure. Take down whole AWS region.
Time skew. Clock drift между nodes.
Data corruption. Bad inputs, malformed events.
Blast radius
Limit blast пока confident.
Stage 1: 1% traffic affected.
Stage 2: 5%.
Stage 3: 25%.
Stage 4: full.Abort если metrics deteriorate.
Production runs only с tested hypothesis.
Game days
Scheduled exercise. Team intentionally breaks something — observe response.
Tests:
- Monitoring catches.
- Runbooks accurate.
- Team escalation paths work.
- Restoration procedures.
Stronger чем reactive incident response. Practice fire drills.
Tools
Chaos Monkey (Netflix). Kills random instances в AWS.
Gremlin. Commercial. Extensive fault library.
Litmus. k8s-native chaos. CRDs.
AWS FIS. Fault Injection Simulator. Managed.
В РФ: чаще DIY скрипты + Yandex.Tank для нагрузки.
Связанные темы
- SLA SLO SLI для SA
- Circuit Breaker для SA
- Capacity planning для SA
- Distributed locks для SA
- Подготовка к собесу системного аналитика
FAQ
Это официальная информация?
Нет. Статья основана на Principles of Chaos Engineering, Netflix engineering blog.
Тренируйте системный анализ — откройте тренажёр с 1500+ вопросами для собесов.