SQL для fraud detection

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Fraud detection — одна из highest-impact analyst tasks. В fintech / e-commerce / marketplace каждый предотвращённый fraud = сохранённые деньги. SQL позволяет quickly test rules и find patterns.

На собесах в банки, платежные системы fraud detection SQL — обязательный знать.

Типы fraud

1. Account takeover

Hacker логинится в real account.

Signals:

  • Login от нового device / IP
  • Unusual transaction right after login
  • Multiple failed logins

2. Card testing

Проверяет stolen card через small amounts.

Signals:

  • Many small transactions
  • Different merchants
  • High decline rate

3. Promo abuse

Multiple accounts для discount.

Signals:

  • Same device / IP
  • Same payment method
  • Sequential emails

4. Chargeback fraud

Claims false fraud after legitimate purchase.

Signals:

  • First purchase
  • High value
  • Delivery disputes

Velocity rules

«Больше N transactions за X минут»:

SELECT user_id, COUNT(*) AS tx_count
FROM transactions
WHERE created_at >= NOW() - INTERVAL '5 minutes'
GROUP BY user_id
HAVING COUNT(*) > 10;

Flag для review.

Multiple cards, one user

SELECT
    user_id,
    COUNT(DISTINCT card_fingerprint) AS unique_cards
FROM transactions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY user_id
HAVING COUNT(DISTINCT card_fingerprint) > 5;

Multiple users, one card

SELECT
    card_fingerprint,
    COUNT(DISTINCT user_id) AS unique_users
FROM transactions
GROUP BY card_fingerprint
HAVING COUNT(DISTINCT user_id) > 3;

Red flag.

Geographic anomaly

Transaction from USA, potом through minute — Australia. Невозможно.

WITH with_prev AS (
    SELECT
        user_id,
        transaction_id,
        created_at,
        country,
        LAG(country) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_country,
        LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_time
    FROM transactions
)
SELECT *
FROM with_prev
WHERE country != prev_country
    AND EXTRACT(EPOCH FROM (created_at - prev_time)) / 3600 < 1
    -- travel time < 1 hour между странами
;

Amount anomaly

Amount > 3σ от user's typical:

WITH user_stats AS (
    SELECT
        user_id,
        AVG(amount) AS avg_amount,
        STDDEV(amount) AS std_amount
    FROM transactions
    WHERE created_at < CURRENT_DATE
    GROUP BY user_id
)
SELECT t.*, us.avg_amount, us.std_amount
FROM transactions t
JOIN user_stats us ON t.user_id = us.user_id
WHERE ABS(t.amount - us.avg_amount) > 3 * us.std_amount
    AND t.created_at >= CURRENT_DATE;

Time-of-day anomaly

User never transactions в 3AM:

WITH user_hours AS (
    SELECT user_id, ARRAY_AGG(DISTINCT EXTRACT(HOUR FROM created_at)) AS normal_hours
    FROM transactions
    WHERE created_at < CURRENT_DATE - 7
    GROUP BY user_id
)
SELECT t.*
FROM transactions t
JOIN user_hours uh ON t.user_id = uh.user_id
WHERE NOT (EXTRACT(HOUR FROM t.created_at) = ANY(uh.normal_hours))
    AND t.created_at >= CURRENT_DATE;

Same device multi-account

SELECT
    device_fingerprint,
    COUNT(DISTINCT user_id) AS users_on_device
FROM sessions
WHERE created_at >= CURRENT_DATE - 30
GROUP BY device_fingerprint
HAVING COUNT(DISTINCT user_id) > 5;

Graph analysis

Connect users через shared attributes:

-- Users sharing card
SELECT DISTINCT a.user_id AS user_a, b.user_id AS user_b
FROM transactions a
JOIN transactions b ON a.card_fingerprint = b.card_fingerprint
WHERE a.user_id != b.user_id;

Build fraud ring detection.

Chargeback rate per merchant

SELECT
    merchant_id,
    COUNT(CASE WHEN is_chargeback THEN 1 END) * 1.0 /
    COUNT(*) AS chargeback_rate,
    COUNT(*) AS total_tx
FROM transactions
WHERE created_at >= CURRENT_DATE - 90
GROUP BY merchant_id
HAVING COUNT(*) > 100
ORDER BY chargeback_rate DESC;

High rate → risk merchant.

Alert system

Combine rules в scoring:

WITH risk_flags AS (
    SELECT
        t.transaction_id,
        CASE WHEN amount > user_stats.p99_amount THEN 1 ELSE 0 END AS amt_flag,
        CASE WHEN country != user_stats.usual_country THEN 1 ELSE 0 END AS geo_flag,
        CASE WHEN hour NOT IN usual_hours THEN 1 ELSE 0 END AS time_flag,
        ...
    FROM transactions t
    JOIN user_stats USING (user_id)
    WHERE t.created_at >= NOW() - INTERVAL '1 hour'
)
SELECT *, amt_flag + geo_flag + time_flag + ... AS risk_score
FROM risk_flags
WHERE amt_flag + geo_flag + time_flag + ... >= 3;

3+ flags → review.

Machine learning extension

SQL rules — baseline. ML (XGBoost, logistic) для better accuracy.

Features:

  • Velocity stats
  • User history
  • Device
  • Network

Performance

Fraud queries — real-time часто. Optimize:

  • Index transactions(user_id, created_at)
  • Partition by date
  • Materialized views для user_stats
  • Stream processing (не batch) для urgent

На собесе

«Как detected fraud transaction?»

Structure answer:

  • Rules (velocity, amount, geo)
  • Behavioral (departure от user's norm)
  • Graph (shared attributes)
  • ML layer

Не «одна rule»—comprehensive.

Частые ошибки

False positives

Aggressive rules → block legitimate. Trade-off.

Static rules

Fraudsters adapt. Rules должны evolve.

Ignore velocity

One transaction seems ok, 10 за minute — red flag.

No user baseline

«$1000 transaction» — huge для one, normal для other. Context matters.

Связанные темы

FAQ

Real-time или batch?

Both needed. Real-time для block. Batch для review.

Precision vs recall?

Depends. Over-block легитимных — lose money. Miss fraud — lose money. Balance.

Privacy?

Strict. Fraud data — sensitive.


Тренируйте SQL — откройте тренажёр с 1500+ вопросами для собесов.