Как аналитик работает с ML-командой
Роль аналитика в ML
В больших компаниях ML-команда и аналитики работают тесно. Аналитик — мост между data science и product.
Типичные контрибуции аналитика:
- Feature engineering input.
- Data quality validation.
- Model evaluation для business metrics.
- Production monitoring.
- Experiment design.
- Translating ML results для stakeholders.
Аналитик не пишет модели, но без его работы модели не приносят value.
Data preparation phase
Перед training модели — нужны clean, labeled, representative данные. Ключевая часть аналитика.
Data sources identification.
Какие events/tables содержат relevant signals? Какие у них schemas, coverage, latency?
Аналитик знает data landscape лучше ML engineer (обычно). Guide им в выборе sources.
Labels creation.
Для supervised learning нужны labels. Как definovать «churn»? «fraudulent transaction»? «relevant recommendation»?
Definition often аналитическая work. ML использует их для training.
Data quality.
Missing values, outliers, inconsistencies, duplicates. Аналитик familiarный с data — spots issues, которые ML может miss.
Representativeness.
Sample для training отражает production? Training на данных из одного региона — модель fail в других.
Feature engineering
Creative step. Features often matter больше чем model choice.
Аналитик contributions:
- Знание relevant features — time-to-first-purchase, recency, frequency, monetary.
- Understanding what behaviors predict outcomes.
- Domain-specific transformations (seasonality, holidays).
ML engineer contributions:
- Scaling features, encoding categoricals.
- Automated feature generation.
- Efficient pipelines.
Collaboration sweet spot: аналитик proposes features с business rationale, ML engineer implements и tests.
Example: для churn model аналитик predлагает «days since last app open». ML engineer добавляет в pipeline.
Model evaluation
ML engineer строит модель, reports accuracy, AUC, precision-recall.
Analyst's job:
1. Translate metrics to business.
- AUC 0.75 — что это значит для бизнеса?
- Что target precision-recall trade-off для наши business?
2. Segment evaluation.
- Performance одинаковый по всем regions, user types, products?
- Fairness и bias analysis.
3. Confidence thresholds.
Для binary classifier часто ML reports various thresholds. Business chooses trade-off. Аналитик helps translate.
4. Business impact estimation.
«Модель с precision 0.85 и recall 0.60 — сколько revenue mined? Сколько false positives costs?»
A/B testing ML models
Before deploying — A/B test.
Analyst's role:
1. Design experiment.
- What's control (существующая модель или no-ML baseline)?
- What metric matters?
- Sample size и duration?
- Randomization unit (user, request, session)?
2. Analysis.
- Did new model beat baseline significantly?
- Guardrails: response time, error rates, UX metrics.
- Segment-level effects: overall positive but hurting some segments?
3. Recommendations.
«Deploy», «don't deploy», «iterate first». Based на evidence.
Production monitoring
Модель launched. Work не done — начинается.
Things to monitor:
1. Performance metrics.
AUC, precision, recall на production data. Compare с training time.
2. Distribution drift.
Input features changing over time? If yes — model may degrade.
from scipy.stats import ks_2samp
# Сравнить распределения features
for feature in features:
stat, p = ks_2samp(training_data[feature], production_data[feature])
if p < 0.05:
print(f'{feature}: distribution drift detected')3. Output drift.
Prediction distribution changing? Might signal model decay.
4. Business impact.
North star metric responding к model? If yes — good. If not — why?
5. Outcomes validation.
Eventually model predictions validate против реальности. Accuracy сохраняется?
Аналитик typically owns мonitoring dashboards.
Post-deployment iteration
Models не static. Need iteration:
- Periodic retraining.
- Feature updates.
- Thresholds adjustments.
- Bug fixes.
Analyst's input для iteration:
- Segments где model хуже всего работает.
- Business changes that may affect model.
- New features idea based на observed behavior.
- Customer feedback correlating с predictions.
Collaboration continues beyond launch.
Common projects
Типичные ML-projects where analyst plays role:
Churn prediction. Analyst defines churn, provides features based на behavior, evaluates business impact.
Lead scoring. Analyst provides conversion definitions, segments, interpretability для sales team.
Recommendation systems. Analyst evaluates за engagement, revenue, diversity.
Pricing optimization. Analyst evaluates revenue, margin, win rate impact.
Fraud detection. Analyst provides labels, measures precision-recall trade-off, costs false positives.
Marketing attribution. Analyst collaborates на models, provides business context.
Translation skills
Главная ценность аналитика в ML projects — translation.
Between technical и business languages:
ML engineer: «ROC AUC 0.82 with threshold 0.5 optimizing F1».
Translation: «Модель correctly идентифицирует 78% churned users, с 15% false positives. При текущих параметрах мы сэкономим $500k annually на targeted retention campaigns».
Second понятно executive, first — только ML engineers.
Explainability
ML models often чёрные ящики. Analyst helps unbox.
Tools:
SHAP values. Feature contributions на individual predictions.
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
shap.summary_plot(shap_values, X)Feature importance. Built-in во многих models (Random Forest, XGBoost).
Partial dependence plots. Average effect feature на predictions.
LIME. Local explanations indivual predictions.
Analyst combines technical explanations с business interpretations для stakeholders.
ML-проекты — важная часть senior-уровня работы аналитика. В тренажёре Карьерник есть задачи на ML, feature engineering, evaluation metrics и продуктовую аналитику.
Tools overlap
Часто same tools, different usage:
Python / pandas. Analyst для analysis. ML engineer для features и training.
SQL. Analyst для queries. ML для feature extraction.
Jupyter. Both. Analyst explore data. ML engineer develop models.
Git. Analyst для own scripts и dashboards SQL. ML для model code.
dbt. Both. Analyst создаёт metric layers, ML pulls features.
MLflow. ML engineer tracks experiments. Analyst consumes.
Similar stacks → easier collaboration.
Division of responsibility
Clear boundaries help:
Analyst owns:
- Business metrics definitions.
- Experiment design / analysis.
- Production monitoring dashboards.
- Stakeholder communication.
ML engineer owns:
- Model architecture, training.
- Production deployment.
- Feature pipelines.
- Technical monitoring.
Shared:
- Feature engineering ideation.
- Data quality.
- Evaluation strategy.
- Post-deployment iteration.
Flexible в practice — depends на team composition.
Career intersection
Multiple paths:
Analyst → ML analyst. Specialize в ML-adjacent работа.
Analyst → Applied DS. Expand skills, become ML practitioner.
Analyst → ML Engineer. Rare — requires significant technical growth.
Stay analyst, collaborate. Recognized как go-to analyst для ML projects.
Growth opportunities abound. Choose based on interests (business vs technical).
Learning ML as analyst
Recommend learning even если не switching:
Basics:
- Regression, classification.
- Train/test split, cross-validation.
- Evaluation metrics (MAE, RMSE, AUC, precision-recall).
- Bias-variance trade-off.
Intermediate:
- Tree-based methods (Random Forest, Gradient Boosting).
- Feature engineering patterns.
- Hyperparameter tuning.
Advanced (optional):
- Neural networks basics.
- Specific algorithms (survival, recommendation, ranking).
Courses: Andrew Ng's ML course, fast.ai, Google ML crash course, books (ISLR, Hands-On ML).
Practical projects (Kaggle) — learn by doing.
Типичные ошибки
«ML solves everything». Не. Simple heuristics often competitive. AB-test ML vs baseline.
Skipping data quality. «Model не работает? Must be code». No — data первая culprit.
Over-trust accuracy. High accuracy на imbalanced data — can be misleading. 99% predicting «no churn» when 1% actual churn — useless.
Ignoring drift. Deploying model once, forgetting. Production monitoring critical.
No clear business metric. «Accurate model» не enough. Accurate model that hurts business — failure.
Skip explainability. Stakeholders не trust black boxes. Invest в explanations.
Читайте также
FAQ
Нужен ли PhD для ML работы?
Для research — yes. Для applied ML — no. Good engineers/analysts с experience compete с PhDs в industry.
Данные важнее models?
Usually yes. «Garbage in, garbage out». 70% effort data, 30% model — typical.
Analyst без ML — stuck в карьере?
Не обязательно. Pure analytics career viable. Но ML knowledge opens больше roles и higher seniority levels.
Когда ML не нужен?
Simple rules, small data, explainability critical, regulatory constraints. Don't force ML везде.