Model deployment strategies на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
REST API
Standard online inference.
POST /predict {features: ...} → {prediction: ...}Tools: FastAPI, Triton, TorchServe, TF Serving, BentoML.
Pros: flexible, language-agnostic.
Cons: network latency.
Batch inference
Process records в bulk асинхронно.
Spark / dbt-ml / SageMaker Batch Transform.
Read 1M records → predict → write predictions.Pros: efficient throughput.
Cons: не real-time.
Use cases: nightly scoring, bulk reports.
Embedded
Model в application. No external service.
model = load_model('model.pkl')
prediction = model.predict(x) # in-process.Pros: no network. Lower latency.
Cons: model updates require app deploy.
ONNX, native libraries.
Edge / mobile
On-device inference.
TensorFlow Lite. Mobile / embedded.
Core ML. iOS.
ONNX Runtime Mobile. Cross-platform.
llama.cpp. LLM на CPU / mobile.
Pros: offline, privacy, latency.
Cons: smaller models, hardware variance.
Streaming
Process events from Kafka / Pulsar.
Kafka topic → Spark / Flink → predict → output topic / DB.Pros: continuous low-latency processing.
Cons: complex infra.
Used fraud detection, real-time recommendations.
Связанные темы
- Inference optimization для DS
- ML latency optimization для DS
- Model versioning для DS
- Canary и shadow deployment ML для DS
- Подготовка к собесу Data Scientist
FAQ
Это официальная информация?
Нет. Статья основана на ML deployment industry practices.
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.