Feature importance: SHAP vs Gain

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

«Какие features важнее всего в модели?» — вопрос, который stakeholders спрашивают на каждой презентации ML project. Аналитик, который ответит «вот gain-importance», получит side-eye от ML-engineer, потому что gain может быть misleading.

SHAP — modern standard для interpretability. Middle+ аналитик должен знать разницу.

Что такое feature importance

Measure, какие features contribute к model predictions.

Помогает:

  • Interpret model
  • Explain к stakeholders
  • Debug (weird feature import → data issue?)
  • Feature selection

Методы

1. Gain (tree-based)

Для tree models (Random Forest, XGBoost).

Sum of information gain каждой splitting feature. Default в XGBoost.

Python:

model.feature_importances_

Plus: fast, built-in. Minus:

  • Bias toward high-cardinality features
  • Global, не per-prediction
  • Не intuitive («gain» — что это?)

2. Split

Counts how often feature used для split.

Similar issues as gain.

3. Permutation importance

Shuffle column values → check performance drop.

from sklearn.inspection import permutation_importance

r = permutation_importance(model, X_val, y_val, n_repeats=10)

Plus: model-agnostic, less biased. Minus: computationally expensive.

4. SHAP (SHapley Additive exPlanations)

Based on game theory. Fair credit attribution.

Для каждого prediction:

P(x) = baseline + sum(SHAP values)

SHAP value = contribution feature к specific prediction.

import shap

explainer = shap.Explainer(model)
shap_values = explainer(X)

# Global importance
shap.summary_plot(shap_values, X)
# Per-prediction
shap.waterfall_plot(shap_values[0])

Plus: theoretically sound, per-prediction. Minus: slow для big data.

Gain vs SHAP: ключевая разница

Gain

  • Tree-specific
  • Global
  • Biased (high-cardinality features «важнее»)
  • Один scalar per feature

SHAP

  • Model-agnostic
  • Per-prediction + aggregate
  • Theoretically principled
  • Direction of effect (positive vs negative)

Пример разницы

Модель churn prediction.

Gain:

  • age — 25%
  • days_since_last_login — 20%
  • country — 15%
  • income — 10%

SHAP summary shows что:

  • days_since_last_login — actually highest impact
  • age inflated потому что high cardinality

SHAP часто corrects gain misleads.

Per-prediction insights

SHAP может объяснить конкретное prediction:

«Почему этот user predicted как churn?»

  • High: days_since_login = 30 (+20% churn)
  • Medium: no premium = yes (+5%)
  • Low: recent support ticket (-2%)

Gain не может. SHAP — can.

Tools

shap

import shap

# Explain model
explainer = shap.Explainer(model, X_train)
shap_values = explainer(X_test)

# Plots
shap.summary_plot(shap_values, X_test)
shap.waterfall_plot(shap_values[0])
shap.dependence_plot('feature_name', shap_values, X_test)

eli5

Alternative, simpler syntax.

lime

Local explanations, similar to SHAP, но less theoretical.

В аналитике

Business communication

SHAP plots → easy к show к business: «эти factors drive churn».

Model debugging

Weird SHAP → data issue?

Feature engineering

SHAP shows interactions, non-linearities → ideas for new features.

Causal inference?

SHAP shows correlation, not causation. Careful!

Частые ошибки

Interpret Gain как causal

«Age increases churn» — не следует из gain. Correlation.

Ignore context

Single value может вводить в заблуждение. Always look at distributions.

Skip per-prediction

Global importance often masks useful patterns per user.

Forget direction

Feature «important» не означает positive impact. Could go either way.

На собесе

«Как interpret feature importance?» Gain для quick look, SHAP для rigorous interpretation.

«Проблемы gain?» Biased toward high-cardinality, tree-specific, global only.

«SHAP вкратце?» Game theory, fair attribution, per-prediction + global.

«Causation?» No. SHAP shows correlation с prediction.

Связанные темы

FAQ

SHAP для больших datasets?

Slow. Subsample или use TreeSHAP (faster).

Permutation enough?

Good baseline, часто достаточно. SHAP когда нужно per-prediction.

Neural nets?

SHAP works (DeepSHAP). Или integrated gradients.


Тренируйте ML — откройте тренажёр с 1500+ вопросами для собесов.