SQL для fraud detection
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
Fraud detection — одна из highest-impact analyst tasks. В fintech / e-commerce / marketplace каждый предотвращённый fraud = сохранённые деньги. SQL позволяет quickly test rules и find patterns.
На собесах в банки, платежные системы fraud detection SQL — обязательный знать.
Типы fraud
1. Account takeover
Hacker логинится в real account.
Signals:
- Login от нового device / IP
- Unusual transaction right after login
- Multiple failed logins
2. Card testing
Проверяет stolen card через small amounts.
Signals:
- Many small transactions
- Different merchants
- High decline rate
3. Promo abuse
Multiple accounts для discount.
Signals:
- Same device / IP
- Same payment method
- Sequential emails
4. Chargeback fraud
Claims false fraud after legitimate purchase.
Signals:
- First purchase
- High value
- Delivery disputes
Velocity rules
«Больше N transactions за X минут»:
SELECT user_id, COUNT(*) AS tx_count
FROM transactions
WHERE created_at >= NOW() - INTERVAL '5 minutes'
GROUP BY user_id
HAVING COUNT(*) > 10;Flag для review.
Multiple cards, one user
SELECT
user_id,
COUNT(DISTINCT card_fingerprint) AS unique_cards
FROM transactions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY user_id
HAVING COUNT(DISTINCT card_fingerprint) > 5;Multiple users, one card
SELECT
card_fingerprint,
COUNT(DISTINCT user_id) AS unique_users
FROM transactions
GROUP BY card_fingerprint
HAVING COUNT(DISTINCT user_id) > 3;Red flag.
Geographic anomaly
Transaction from USA, potом through minute — Australia. Невозможно.
WITH with_prev AS (
SELECT
user_id,
transaction_id,
created_at,
country,
LAG(country) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_country,
LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_time
FROM transactions
)
SELECT *
FROM with_prev
WHERE country != prev_country
AND EXTRACT(EPOCH FROM (created_at - prev_time)) / 3600 < 1
-- travel time < 1 hour между странами
;Amount anomaly
Amount > 3σ от user's typical:
WITH user_stats AS (
SELECT
user_id,
AVG(amount) AS avg_amount,
STDDEV(amount) AS std_amount
FROM transactions
WHERE created_at < CURRENT_DATE
GROUP BY user_id
)
SELECT t.*, us.avg_amount, us.std_amount
FROM transactions t
JOIN user_stats us ON t.user_id = us.user_id
WHERE ABS(t.amount - us.avg_amount) > 3 * us.std_amount
AND t.created_at >= CURRENT_DATE;Time-of-day anomaly
User never transactions в 3AM:
WITH user_hours AS (
SELECT user_id, ARRAY_AGG(DISTINCT EXTRACT(HOUR FROM created_at)) AS normal_hours
FROM transactions
WHERE created_at < CURRENT_DATE - 7
GROUP BY user_id
)
SELECT t.*
FROM transactions t
JOIN user_hours uh ON t.user_id = uh.user_id
WHERE NOT (EXTRACT(HOUR FROM t.created_at) = ANY(uh.normal_hours))
AND t.created_at >= CURRENT_DATE;Same device multi-account
SELECT
device_fingerprint,
COUNT(DISTINCT user_id) AS users_on_device
FROM sessions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY device_fingerprint
HAVING COUNT(DISTINCT user_id) > 5;Graph analysis
Connect users через shared attributes:
-- Users sharing card
SELECT DISTINCT a.user_id AS user_a, b.user_id AS user_b
FROM transactions a
JOIN transactions b ON a.card_fingerprint = b.card_fingerprint
WHERE a.user_id != b.user_id;Build fraud ring detection.
Chargeback rate per merchant
SELECT
merchant_id,
COUNT(CASE WHEN is_chargeback THEN 1 END) * 1.0 /
COUNT(*) AS chargeback_rate,
COUNT(*) AS total_tx
FROM transactions
WHERE created_at >= CURRENT_DATE - 90
GROUP BY merchant_id
HAVING COUNT(*) > 100
ORDER BY chargeback_rate DESC;High rate → risk merchant.
Alert system
Combine rules в scoring:
WITH risk_flags AS (
SELECT
t.transaction_id,
CASE WHEN amount > user_stats.p99_amount THEN 1 ELSE 0 END AS amt_flag,
CASE WHEN country != user_stats.usual_country THEN 1 ELSE 0 END AS geo_flag,
CASE WHEN hour NOT IN usual_hours THEN 1 ELSE 0 END AS time_flag,
...
FROM transactions t
JOIN user_stats USING (user_id)
WHERE t.created_at >= NOW() - INTERVAL '1 hour'
)
SELECT *, amt_flag + geo_flag + time_flag + ... AS risk_score
FROM risk_flags
WHERE amt_flag + geo_flag + time_flag + ... >= 3;3+ flags → review.
Machine learning extension
SQL rules — baseline. ML (XGBoost, logistic) для better accuracy.
Features:
- Velocity stats
- User history
- Device
- Network
Performance
Fraud queries — real-time часто. Optimize:
- Index
transactions(user_id, created_at) - Partition by date
- Materialized views для user_stats
- Stream processing (не batch) для urgent
На собесе
«Как detected fraud transaction?»
Structure answer:
- Rules (velocity, amount, geo)
- Behavioral (departure от user's norm)
- Graph (shared attributes)
- ML layer
Не «одна rule»—comprehensive.
Частые ошибки
False positives
Aggressive rules → block legitimate. Trade-off.
Static rules
Fraudsters adapt. Rules должны evolve.
Ignore velocity
One transaction seems ok, 10 за minute — red flag.
No user baseline
«$1000 transaction» — huge для one, normal для other. Context matters.
Связанные темы
FAQ
Real-time или batch?
Both needed. Real-time для block. Batch для review.
Precision vs recall?
Depends. Over-block легитимных — lose money. Miss fraud — lose money. Balance.
Privacy?
Strict. Fraud data — sensitive.
Тренируйте SQL — откройте тренажёр с 1500+ вопросами для собесов.