Feature importance: SHAP vs Gain
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
«Какие features важнее всего в модели?» — вопрос, который stakeholders спрашивают на каждой презентации ML project. Аналитик, который ответит «вот gain-importance», получит side-eye от ML-engineer, потому что gain может быть misleading.
SHAP — modern standard для interpretability. Middle+ аналитик должен знать разницу.
Что такое feature importance
Measure, какие features contribute к model predictions.
Помогает:
- Interpret model
- Explain к stakeholders
- Debug (weird feature import → data issue?)
- Feature selection
Методы
1. Gain (tree-based)
Для tree models (Random Forest, XGBoost).
Sum of information gain каждой splitting feature. Default в XGBoost.
Python:
model.feature_importances_Plus: fast, built-in. Minus:
- Bias toward high-cardinality features
- Global, не per-prediction
- Не intuitive («gain» — что это?)
2. Split
Counts how often feature used для split.
Similar issues as gain.
3. Permutation importance
Shuffle column values → check performance drop.
from sklearn.inspection import permutation_importance
r = permutation_importance(model, X_val, y_val, n_repeats=10)Plus: model-agnostic, less biased. Minus: computationally expensive.
4. SHAP (SHapley Additive exPlanations)
Based on game theory. Fair credit attribution.
Для каждого prediction:
P(x) = baseline + sum(SHAP values)SHAP value = contribution feature к specific prediction.
import shap
explainer = shap.Explainer(model)
shap_values = explainer(X)
# Global importance
shap.summary_plot(shap_values, X)
# Per-prediction
shap.waterfall_plot(shap_values[0])Plus: theoretically sound, per-prediction. Minus: slow для big data.
Gain vs SHAP: ключевая разница
Gain
- Tree-specific
- Global
- Biased (high-cardinality features «важнее»)
- Один scalar per feature
SHAP
- Model-agnostic
- Per-prediction + aggregate
- Theoretically principled
- Direction of effect (positive vs negative)
Пример разницы
Модель churn prediction.
Gain:
age— 25%days_since_last_login— 20%country— 15%income— 10%
SHAP summary shows что:
days_since_last_login— actually highest impactageinflated потому что high cardinality
SHAP часто corrects gain misleads.
Per-prediction insights
SHAP может объяснить конкретное prediction:
«Почему этот user predicted как churn?»
- High: days_since_login = 30 (+20% churn)
- Medium: no premium = yes (+5%)
- Low: recent support ticket (-2%)
Gain не может. SHAP — can.
Tools
shap
import shap
# Explain model
explainer = shap.Explainer(model, X_train)
shap_values = explainer(X_test)
# Plots
shap.summary_plot(shap_values, X_test)
shap.waterfall_plot(shap_values[0])
shap.dependence_plot('feature_name', shap_values, X_test)eli5
Alternative, simpler syntax.
lime
Local explanations, similar to SHAP, но less theoretical.
В аналитике
Business communication
SHAP plots → easy к show к business: «эти factors drive churn».
Model debugging
Weird SHAP → data issue?
Feature engineering
SHAP shows interactions, non-linearities → ideas for new features.
Causal inference?
SHAP shows correlation, not causation. Careful!
Частые ошибки
Interpret Gain как causal
«Age increases churn» — не следует из gain. Correlation.
Ignore context
Single value может вводить в заблуждение. Always look at distributions.
Skip per-prediction
Global importance often masks useful patterns per user.
Forget direction
Feature «important» не означает positive impact. Could go either way.
На собесе
«Как interpret feature importance?» Gain для quick look, SHAP для rigorous interpretation.
«Проблемы gain?» Biased toward high-cardinality, tree-specific, global only.
«SHAP вкратце?» Game theory, fair attribution, per-prediction + global.
«Causation?» No. SHAP shows correlation с prediction.
Связанные темы
- Что такое feature importance
- Что такое feature в ML
- XGBoost vs Random Forest
- Classification vs regression
- Multicollinearity
FAQ
SHAP для больших datasets?
Slow. Subsample или use TreeSHAP (faster).
Permutation enough?
Good baseline, часто достаточно. SHAP когда нужно per-prediction.
Neural nets?
SHAP works (DeepSHAP). Или integrated gradients.
Тренируйте ML — откройте тренажёр с 1500+ вопросами для собесов.