ML vs Deep Learning для аналитика
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
«Deep learning — всё». Нет. Для tabular data классический ML часто beats deep learning.
Аналитик должен знать различие — не тратить месяцы на NN когда XGBoost за день решит.
Классический ML
Algorithms
- Linear / Logistic regression
- Decision trees
- Random Forest
- Gradient boosting (XGBoost, LightGBM, CatBoost)
- SVM
- KNN
- Naive Bayes
Характеристики
- Feature engineering manual
- Interpretability usually decent
- Small-medium data
- Fast training / prediction
- Proven на tabular
Deep learning
Models
- Multi-layer perceptrons (MLP)
- Convolutional (CNN) — images
- Recurrent (RNN, LSTM) — sequences
- Transformers — language, advanced
Характеристики
- Auto feature extraction
- Interpretability hard
- Requires large data
- GPU / expensive
- State-of-art: images, NLP, speech
Выбор
Tabular data
Winner: XGBoost / LightGBM / CatBoost (classical).
Kaggle результаты consistently.
NN может be competitive, но usually not.
Exception: massive dataset (10M+ rows) с complex patterns.
Images
Deep learning (CNNs). No competition.
Text
Before 2018: classical с TF-IDF.
2018+: transformers (BERT, GPT) dominate.
Audio / speech
Deep learning.
Time series
Mixed. Classical (Prophet, ARIMA) often enough. NN (LSTM) rarely meaningful improvement.
Для аналитика
Most analyst roles
Classical ML достаточно.
Data scientist
Deep learning expected, особенно specialized (CV, NLP).
Research
Deep learning forefront.
Сложность
Learning curve
- Classical: 3-6 months proficient
- Deep learning: 6-12 months basics, years mastery
Compute
- Classical: laptop
- Deep learning: GPU often
Time to train
- Classical: seconds-minutes
- Deep learning: hours-days
Deployment
- Classical: straightforward
- Deep learning: more complex (ONNX, quantization)
Interpretability
Classical
- Linear models: directly
- Trees: feature importance + SHAP
- Others: SHAP / LIME
Deep learning
Harder. Grad-CAM, attention visualization, etc. Less decisive.
Business requirement (regulatory etc) — classical advantage.
Примеры
Churn prediction
Classical (XGBoost). 10k-1M users, tabular features.
Image classification
Deep learning (CNN). Classic task.
Spam detection
Historical: classical. Modern: fine-tuned BERT.
Recommendation
Hybrid: classical collaborative filtering + neural.
Sales forecasting
Classical (Prophet / XGBoost) usually sufficient.
Compute economics
Classical
$0-100 / month training on cloud.
Deep learning
$100-10000+ / month for serious projects.
LLM fine-tuning: thousands of dollars.
Budget matters.
Hybrid
Often best: classical для basic + NN специальным cases.
Example:
- Feature extraction — pre-trained NN (text embeddings)
- Classifier — XGBoost on embeddings
Combine strengths.
When NN win tabular
- 10M+ rows
- Complex non-linear interactions
- High feature cardinality (embeddings помогают)
- Generative needs
Otherwise — XGBoost.
Tools
Classical
- scikit-learn
- XGBoost, LightGBM, CatBoost
- statsmodels
Deep learning
- PyTorch (research standard)
- TensorFlow / Keras (enterprise)
- HuggingFace (pre-trained models)
AutoML
- AutoGluon
- H2O
- FLAML
Combines both.
На собесе
«Deep learning vs classical?»
Depends task:
- Tabular → classical
- Image / text / audio → deep learning
- Limited data → classical
«Почему XGBoost популярен?»
Handles tabular excellently. Fast, accurate, interpretable enough.
«Neural сети всегда better?»
No. Often overkill. Classical often sufficient.
Роли
Data analyst
Classical basics. Use models, не build deep.
Data scientist (IC)
Both. Classical daily, deep learning specific tasks.
ML engineer
Deep learning often primary.
Research
Deep learning forefront.
Tabular + LLMs
2024+: LLMs на tabular data возможны.
Zero-shot classification, embeddings extraction.
Useful для small-data cases. Still XGBoost wins на big tabular.
Recommendation для analyst
Month 1-6
Classical ML foundations. Logistic, trees, XGBoost.
Month 7-12
If interesting — add NN basics. Fast.ai course good.
Year 2+
Specialize. Pick area (CV / NLP / time series если relevant work).
Most analyst roles — classical enough.
Учиться
Classical
- «Hands-On Machine Learning» — Aurélien Géron
- scikit-learn docs
- Kaggle courses
Deep learning
- fast.ai
- Andrej Karpathy videos
- Deep Learning book (Goodfellow et al.)
Applied ML для analyst
- Coursera: Applied Data Science w/ Python
- Stanford CS229 (classical)
- Stanford CS231n (CNN)
Ethical
Both has bias issues. Explainability harder NN.
Regulated industries (finance, health) — prefer classical for accountability.
Связанные темы
FAQ
Обязательно знать оба?
Classical — yes. Deep learning — depends on role.
LLMs заменят всё?
Not tabular. Specific niches.
Как pivot к deep learning?
Start fast.ai course. Practical.
Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.