Adversarial attacks на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Что такое adversarial attack
Small perturbation input → wrong prediction. Imperceptible к humans.
Image of cat + tiny noise → model says "dog" с 99% confidence.Discovered Szegedy 2013. Fundamental NN vulnerability.
FGSM
Fast Gradient Sign Method. Single step.
x_adv = x + ε · sign(∇_x L(x, y))Move в direction maximizing loss. ε — perturbation magnitude.
Simple, fast, weak attack.
PGD
Projected Gradient Descent. Iterative FGSM.
For N steps:
x_adv = clip(x_adv + α · sign(∇_x L), x - ε, x + ε)Stronger чем FGSM. Standard benchmark attack.
Black-box attacks
Attacker не имеет model access.
- Query-based. Probe predictions, estimate gradients.
- Transfer attacks. Train surrogate model, transfer adversarials.
Surprisingly effective — adversarials transfer between models.
Defenses
Adversarial training. Train на adversarials examples.
For каждое batch: generate adversarials → train на mix clean + adv.Strongest known defense, но reduces clean accuracy.
Defensive distillation. Smooth model — gradients harder to exploit. Largely defeated.
Input preprocessing. Random transforms, denoising. Often broken by adaptive attacks.
Detection. Detect adversarial inputs, refuse prediction.
Certified robustness. Mathematical guarantees (randomized smoothing).
В production critical — robust training mandatory для security-critical apps.
Связанные темы
- Bias и fairness для DS
- Bayesian NN для DS
- Image classifier system design для DS
- Hallucinations и LLM evals для DS
- Подготовка к собесу Data Scientist
FAQ
Это официальная информация?
Нет. Статья основана на работах Szegedy 2013, Goodfellow 2014 (FGSM), Madry 2017 (PGD).
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.