ML vs Deep Learning для аналитика

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

«Deep learning — всё». Нет. Для tabular data классический ML часто beats deep learning.

Аналитик должен знать различие — не тратить месяцы на NN когда XGBoost за день решит.

Классический ML

Algorithms

  • Linear / Logistic regression
  • Decision trees
  • Random Forest
  • Gradient boosting (XGBoost, LightGBM, CatBoost)
  • SVM
  • KNN
  • Naive Bayes

Характеристики

  • Feature engineering manual
  • Interpretability usually decent
  • Small-medium data
  • Fast training / prediction
  • Proven на tabular

Deep learning

Models

  • Multi-layer perceptrons (MLP)
  • Convolutional (CNN) — images
  • Recurrent (RNN, LSTM) — sequences
  • Transformers — language, advanced

Характеристики

  • Auto feature extraction
  • Interpretability hard
  • Requires large data
  • GPU / expensive
  • State-of-art: images, NLP, speech

Выбор

Tabular data

Winner: XGBoost / LightGBM / CatBoost (classical).

Kaggle результаты consistently.

NN может be competitive, но usually not.

Exception: massive dataset (10M+ rows) с complex patterns.

Images

Deep learning (CNNs). No competition.

Text

Before 2018: classical с TF-IDF.

2018+: transformers (BERT, GPT) dominate.

Audio / speech

Deep learning.

Time series

Mixed. Classical (Prophet, ARIMA) often enough. NN (LSTM) rarely meaningful improvement.

Для аналитика

Most analyst roles

Classical ML достаточно.

Data scientist

Deep learning expected, особенно specialized (CV, NLP).

Research

Deep learning forefront.

Сложность

Learning curve

  • Classical: 3-6 months proficient
  • Deep learning: 6-12 months basics, years mastery

Compute

  • Classical: laptop
  • Deep learning: GPU often

Time to train

  • Classical: seconds-minutes
  • Deep learning: hours-days

Deployment

  • Classical: straightforward
  • Deep learning: more complex (ONNX, quantization)

Interpretability

Classical

  • Linear models: directly
  • Trees: feature importance + SHAP
  • Others: SHAP / LIME

Deep learning

Harder. Grad-CAM, attention visualization, etc. Less decisive.

Business requirement (regulatory etc) — classical advantage.

Примеры

Churn prediction

Classical (XGBoost). 10k-1M users, tabular features.

Image classification

Deep learning (CNN). Classic task.

Spam detection

Historical: classical. Modern: fine-tuned BERT.

Recommendation

Hybrid: classical collaborative filtering + neural.

Sales forecasting

Classical (Prophet / XGBoost) usually sufficient.

Compute economics

Classical

$0-100 / month training on cloud.

Deep learning

$100-10000+ / month for serious projects.

LLM fine-tuning: thousands of dollars.

Budget matters.

Hybrid

Often best: classical для basic + NN специальным cases.

Example:

  • Feature extraction — pre-trained NN (text embeddings)
  • Classifier — XGBoost on embeddings

Combine strengths.

When NN win tabular

  • 10M+ rows
  • Complex non-linear interactions
  • High feature cardinality (embeddings помогают)
  • Generative needs

Otherwise — XGBoost.

Tools

Classical

  • scikit-learn
  • XGBoost, LightGBM, CatBoost
  • statsmodels

Deep learning

  • PyTorch (research standard)
  • TensorFlow / Keras (enterprise)
  • HuggingFace (pre-trained models)

AutoML

  • AutoGluon
  • H2O
  • FLAML

Combines both.

На собесе

«Deep learning vs classical?»

Depends task:

  • Tabular → classical
  • Image / text / audio → deep learning
  • Limited data → classical

«Почему XGBoost популярен?»

Handles tabular excellently. Fast, accurate, interpretable enough.

«Neural сети всегда better?»

No. Often overkill. Classical often sufficient.

Роли

Data analyst

Classical basics. Use models, не build deep.

Data scientist (IC)

Both. Classical daily, deep learning specific tasks.

ML engineer

Deep learning often primary.

Research

Deep learning forefront.

Tabular + LLMs

2024+: LLMs на tabular data возможны.

Zero-shot classification, embeddings extraction.

Useful для small-data cases. Still XGBoost wins на big tabular.

Recommendation для analyst

Month 1-6

Classical ML foundations. Logistic, trees, XGBoost.

Month 7-12

If interesting — add NN basics. Fast.ai course good.

Year 2+

Specialize. Pick area (CV / NLP / time series если relevant work).

Most analyst roles — classical enough.

Учиться

Classical

  • «Hands-On Machine Learning» — Aurélien Géron
  • scikit-learn docs
  • Kaggle courses

Deep learning

  • fast.ai
  • Andrej Karpathy videos
  • Deep Learning book (Goodfellow et al.)

Applied ML для analyst

  • Coursera: Applied Data Science w/ Python
  • Stanford CS229 (classical)
  • Stanford CS231n (CNN)

Ethical

Both has bias issues. Explainability harder NN.

Regulated industries (finance, health) — prefer classical for accountability.

Связанные темы

FAQ

Обязательно знать оба?

Classical — yes. Deep learning — depends on role.

LLMs заменят всё?

Not tabular. Specific niches.

Как pivot к deep learning?

Start fast.ai course. Practical.


Тренируйте — откройте тренажёр с 1500+ вопросами для собесов.