Lead scoring для аналитика
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
В B2B / SaaS / fintech аналитики строят lead scoring — ранжируют потенциальных customers по вероятности закрытия. Хороший scoring = sales team focuses на right leads = больше revenue.
На собесах в B2B companies (Контур, SkyEng, B2B-focused teams) lead scoring — part of daily work.
Что такое lead scoring
Lead = potential customer (показал interest).
Lead scoring = assign score each lead, indicating conversion probability.
Sales uses score → priority. 100/100 = hot lead, call immediately. 20/100 = nurture, not immediate.
Types
Rule-based
Manual rules:
- Downloaded whitepaper: +10
- Visited pricing page: +15
- Enterprise company: +20
- No budget info: -10
Total = sum points.
Plus: simple, interpretable. Minus: inflexible, может miss patterns.
Model-based
ML model predicts conversion probability.
Features:
- Behavioral (pages visited, emails opened)
- Demographic (company size, industry, role)
- Engagement (time on site, frequency)
- Explicit (budget, timeline, need)
Target: did lead convert within N days?
Model: logistic regression, random forest, XGBoost.
Hybrid
Rules для guardrails + model для fine-tuning.
Build process
1. Define «conversion»
- Closed-won deal?
- Trial started?
- First payment?
Different targets → different models.
2. Historical data
Leads who converted vs those who didn't.
Timeframe matters: leads from 3 years назад — different market.
3. Features
Brainstorm + experiment.
Behavioral
- Website pages visited
- Content downloaded
- Emails opened / clicked
- Product demo watched
- Session count
- Days since last visit
Demographic
- Industry
- Company size
- Seniority (C-level, manager, IC)
- Geography
Explicit
- Budget
- Timeline
- Need articulated
Engagement score
Combines multiple behaviors.
4. Train model
from sklearn.linear_model import LogisticRegression
X = leads[['page_views', 'demos_watched', 'company_size', ...]]
y = leads['converted']
model = LogisticRegression()
model.fit(X, y)
# Predict
leads['score'] = model.predict_proba(X)[:, 1] * 1005. Validate
- AUC on holdout
- Calibration (predicted vs actual rates)
- Business uplift (sales close rate)
6. Deploy
Score recalculated daily/real-time.
Integration с CRM (Salesforce, Hubspot).
Metrics
Offline
- AUC / Gini
- Precision@top-N (top-20% leads conversion rate)
- Lift (conversion rate в top decile vs average)
Business
- Close rate of scored leads
- Sales time to close
- Win rate by score bucket
- Revenue per lead
Score interpretation
Bucket leads:
- 90-100: hot. Immediate call.
- 70-89: warm. Call this week.
- 50-69: nurture. Email cadence.
- 0-49: cold. Marketing content only.
Sales team prioritizes top buckets.
BANT
Classic qualification framework:
- Budget
- Authority
- Need
- Timeline
Features для score: «has budget», «decision-maker», «active need», «has timeline».
Частые ошибки
Survivorship bias
Train только на «converted» — lose info about «never converted».
Include both.
Feature leakage
Feature «demo watched» — correlates с conversion потому что sales уже engaged. Use only upstream features.
Wrong label
«Converted» нужно consistent definition. Timeframe важна.
Model drift
Market changes → model stale. Retrain quarterly.
MQL vs SQL
MQL (Marketing Qualified Lead)
Shown interest — passed to sales.
SQL (Sales Qualified Lead)
Sales confirmed potential. Ready для real conversation.
Lead scoring часто определяет MQL threshold.
Lead scoring for B2C
Less common, но tools (e.g., subscription upgrades) могут.
«Который user upgrade в next 30 days?» → scoring.
Integration
CRM
Salesforce, HubSpot — score fields.
Marketing automation
Hubspot, Marketo — trigger campaigns based on score.
Analytics
Amplitude / Mixpanel — segment by score.
На собесе
«Как построить lead scoring?» Define target → features → train model → validate → deploy.
«Rule vs model?» Rule для simple/small, model для scale/complex.
«Metrics?» Business (close rate by bucket) + model (AUC).
«Частые ошибки?» Feature leakage, label definition, model drift.
Связанные темы
- Логистическая регрессия
- Что такое scoring модели
- Attribution простыми словами
- AARRR пиратские метрики
FAQ
Для startup нужен?
Sales volume должен оправдать effort. < 100 leads/month — манульно ok.
Python vs ML tools?
Python — flexibility. CRM tools имеют built-in scoring, но basic.
Update частоту?
Daily realtime ideal. Минимум weekly.
Тренируйте ML — откройте тренажёр с 1500+ вопросами для собесов.