SQL вопросы на собесе в Яндекс

Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.

Зачем это знать

Yandex SQL bar — один из высоких в Russia. Подготовка — different от average.

Задачи real-world, оптимизация матерiтся, advanced patterns.

Процесс

Usually 60-90 min technical round:

  • Live coding (shared whiteboard / editor)
  • 2-4 задачи уровня middle-senior
  • Think aloud требование

Типы задач

1. Window functions

Expected fluent:

  • ROW_NUMBER, RANK, DENSE_RANK
  • LAG, LEAD
  • SUM OVER, AVG OVER
  • Frame specification (ROWS / RANGE)

2. Complex JOINs

  • Self-joins
  • LATERAL
  • Multiple conditions
  • Anti-joins

3. Optimization

  • EXPLAIN interpretation
  • Index strategy
  • Query rewriting

4. CTE / recursive

  • Hierarchies
  • Date sequences
  • Graph traversal

5. ClickHouse specific

  • Arrays
  • quantile() functions
  • sumIf / countIf
  • materialized views

Пример задача 1

«Таблица events(user_id, event_type, created_at). Найти сессии пользователей — session = events without > 30 min gap».

Solution

WITH events_gap AS (
    SELECT
        user_id,
        event_type,
        created_at,
        EXTRACT(EPOCH FROM (created_at - LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at))) / 60 AS gap_minutes
    FROM events
),
session_id AS (
    SELECT
        user_id,
        event_type,
        created_at,
        SUM(CASE WHEN gap_minutes > 30 OR gap_minutes IS NULL THEN 1 ELSE 0 END)
            OVER (PARTITION BY user_id ORDER BY created_at) AS session_num
    FROM events_gap
)
SELECT
    user_id,
    session_num,
    MIN(created_at) AS session_start,
    MAX(created_at) AS session_end,
    COUNT(*) AS events_count
FROM session_id
GROUP BY user_id, session_num;

Sessionization classic.

Пример задача 2

«Top-3 purchases каждого user за последний год».

Solution

WITH ranked AS (
    SELECT
        user_id,
        purchase_id,
        amount,
        DENSE_RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rnk
    FROM purchases
    WHERE created_at >= CURRENT_DATE - 365
)
SELECT user_id, purchase_id, amount
FROM ranked
WHERE rnk <= 3;

Windowed rank.

Пример задача 3

«Retention D7 для cohort апреля 2026».

Solution

WITH cohort AS (
    SELECT user_id, DATE(MIN(created_at)) AS signup_date
    FROM users
    WHERE MIN(created_at) BETWEEN '2026-04-01' AND '2026-04-30'
    GROUP BY user_id
),
active_d7 AS (
    SELECT DISTINCT user_id
    FROM events
    WHERE DATE(created_at) = signup_date + 7
)
SELECT
    COUNT(*) AS cohort_size,
    COUNT(DISTINCT a.user_id) AS retained,
    COUNT(DISTINCT a.user_id) * 1.0 / COUNT(*) AS d7_retention
FROM cohort c
LEFT JOIN active_d7 a USING (user_id);

Пример задача 4

«Running 7-day active users».

Solution

SELECT
    day,
    COUNT(DISTINCT user_id) AS dau,
    -- Rolling 7-day (need generative approach)
    (SELECT COUNT(DISTINCT user_id)
     FROM events
     WHERE DATE(created_at) BETWEEN t.day - 6 AND t.day) AS wau
FROM (
    SELECT DISTINCT DATE(created_at) AS day FROM events
) t
ORDER BY day;

Или с window (tricky для distinct).

Пример задача 5

«Funnel: signup → activation → purchase в sequence».

Solution

WITH user_events AS (
    SELECT
        user_id,
        MAX(CASE WHEN event = 'signup' THEN created_at END) AS signup_at,
        MAX(CASE WHEN event = 'activation' THEN created_at END) AS activation_at,
        MAX(CASE WHEN event = 'purchase' THEN created_at END) AS purchase_at
    FROM events
    GROUP BY user_id
)
SELECT
    COUNT(*) FILTER (WHERE signup_at IS NOT NULL) AS signed,
    COUNT(*) FILTER (WHERE signup_at IS NOT NULL AND activation_at > signup_at) AS activated,
    COUNT(*) FILTER (WHERE signup_at IS NOT NULL AND activation_at > signup_at AND purchase_at > activation_at) AS purchased
FROM user_events;

Sequential filter.

Пример задача 6

«Оптимизировать query, running 10 min».

Process:

  1. Run EXPLAIN ANALYZE
  2. Identify:
    • Seq Scan → need index
    • Nested Loop → could hash join
    • Sort → ORDER BY на indexed col?
  3. Add indexes
  4. Consider partitioning
  5. Rewrite если возможно
  6. Materialize if repeated

Narrate thinking.

Пример задача 7

«Recursive CTE — organization hierarchy».

WITH RECURSIVE hierarchy AS (
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, e.name, e.manager_id, h.level + 1
    FROM employees e
    JOIN hierarchy h ON e.manager_id = h.id
)
SELECT * FROM hierarchy ORDER BY level, name;

ClickHouse-specific

Yandex heavy ClickHouse. Expected:

Arrays

SELECT user_id, arrayJoin(tags) FROM users_with_tags;

Quantiles

SELECT quantile(0.95)(response_time) FROM events;

sumIf

SELECT
    sumIf(revenue, country = 'RU') AS ru_rev,
    sumIf(revenue, country = 'US') AS us_rev
FROM transactions;

Materialized views

CREATE MATERIALIZED VIEW daily_summary
ENGINE = SummingMergeTree()
ORDER BY day
AS SELECT toStartOfDay(ts) AS day, count() AS events
FROM raw_events
GROUP BY day;

Communication

Think aloud

Verbalize thought process.

«I need use LAG для sessionization. Let me first understand gap».

Show structured thinking.

Clarify

«Sessions defined как? > 30 min gap?».

Before coding.

Edge cases

«What if user has one event?» «NULL handling?»

Show thoroughness.

Optimization

«This works, но could be faster».

Demonstrate awareness.

Red flags

  • Jump to code без thinking
  • Miss edge cases
  • Не use windows when appropriate
  • Forget NULL handling
  • No query optimization knowledge

Green flags

  • Clarify requirements
  • Multiple approaches
  • Optimization thoughts
  • Clean syntax
  • Edge cases considered

Prep resources

  • LeetCode Database Medium-Hard
  • StrataScratch real company questions
  • ClickHouse documentation tutorial
  • Карьерник Russian SQL задачи

Specific к Yandex teams

Yandex.Search

Ranking, session analysis.

Yandex.Ads

Campaign metrics, budget.

Yandex.Market

Marketplace.

Yandex.Taksi

Two-sided, geo.

Yandex.Cloud

Resource usage, billing.

Different focus per team.

На собесе

Before

Сверьте basics. Memory refresh syntax.

During

  • Greet friendly
  • Clarify
  • Walk approach
  • Code carefully
  • Test
  • Explain trade-offs

После

Ask about team, role.

Зарплаты

Yandex top:

  • Middle: 200-300k
  • Senior: 300-450k
  • Lead: 450-700k

Plus bonus, RSU.

Подготовка

See подготовка SQL 7 дней.

Plus Yandex-specific:

  • Learn ClickHouse basics
  • Practice optimization questions
  • Sessionization patterns

Связанные темы

FAQ

English or Russian?

Russian usually. English sometimes.

Calculator allowed?

Usually. But mental math preferred для simple.

Hints given?

Sometimes if stuck. Don't wait too long.


Готовьтесь — откройте тренажёр с 1500+ вопросами для собесов.