dbt для аналитика с нуля
Карьерник — квиз-тренажёр в Telegram с 1500+ вопросами для собесов аналитика. SQL, Python, A/B, метрики. Бесплатно.
Зачем это знать
dbt — революция в analytics engineering. Transforms DWH через SQL, с tests, docs, lineage. За last 5 лет стал industry-standard в modern data stack. В Yandex, Tinkoff, Avito и международных tech companies — requirement middle+ аналитика.
Что такое dbt
dbt (data build tool) — framework для SQL transformations в warehouse:
- Write SELECT queries (models)
- dbt создаёт tables / views
- Manage dependencies
- Tests / docs / lineage
Не ETL (extraction, load — другие tools). Только Transform (T в ELT).
Философия
ELT не ETL
Modern stack: Extract + Load в DWH, Transform в DWH (SQL).
Раньше: python scripts transform → DWH.
Теперь: dbt transforms in warehouse (fast, scalable).
Analytics engineering
Новая роль между analyst и data engineer. Ответственна за trusted layer в DWH.
Main концепции
Models
SQL files с SELECT — становятся tables / views:
-- models/daily_revenue.sql
SELECT
DATE(created_at) AS day,
SUM(total) AS revenue
FROM raw.orders
GROUP BY 1dbt run → CREATE TABLE daily_revenue AS ....
ref
Dependencies между models:
SELECT * FROM {{ ref('stg_orders') }}{{ ref() }} — dbt resolves к actual table name.
Sources
External tables:
SELECT * FROM {{ source('raw', 'orders') }}Tests
# models/schema.yml
models:
- name: orders
columns:
- name: id
tests: [unique, not_null]
- name: user_id
tests:
- relationships:
to: ref('users')
field: iddbt test — runs all tests.
Docs
Auto-generated documentation site с data lineage graph.
Структура проекта
dbt_project/
├── dbt_project.yml
├── models/
│ ├── staging/
│ │ ├── stg_orders.sql
│ │ └── stg_users.sql
│ ├── intermediate/
│ └── marts/
│ ├── core/
│ └── marketing/
├── tests/
├── macros/
└── seeds/Layers: staging (cleaned), intermediate, marts (business-ready).
Макросы (Jinja)
SQL + Python-like templating:
{% SET regions = ['RU', 'UA', 'KZ'] %}
SELECT
{% for r IN regions %}
SUM(CASE WHEN country = '{{ r }}' THEN revenue END) AS rev_{{ r }}{% if NOT loop.last %},{% endif %}
{% endfor %}
FROM ordersDry-up repeating patterns.
Incremental models
Large tables — не rebuild fully каждый run:
{{ config(materialized='incremental') }}
SELECT * FROM raw.events
{% if is_incremental() %}
WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}Only new rows processed.
dbt Cloud vs Core
dbt Core
- Open-source CLI
- Self-host
- Free
dbt Cloud
- Managed UI
- Scheduling
- $100+/user/month
Teams часто: Core для dev, Cloud для production scheduling.
Workflow
# Develop
dbt run --select model_name
# Test
dbt test --select model_name
# Docs
dbt docs generate
dbt docs serve
# Full run
dbt run
dbt testDeployment
Scheduler
- dbt Cloud
- Airflow + dbt Core
- GitHub Actions
Обычно: dbt runs каждые 1-24 hours.
Git integration
Standard SWE workflow:
- Feature branch
- PR review
- Merge to main
- Deploy
Связь с BI
dbt выводит clean tables → BI (Looker, Metabase, Tableau) reads.
Single source of truth — metric definition в dbt, не в каждом BI.
Macros, packages
dbt-utils, dbt-expectations — collections of utilities:
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: 1.0.0-- Use
SELECT {{ dbt_utils.generate_surrogate_key(['col1', 'col2']) }} AS id
FROM TABLETests
Built-in
- not_null
- unique
- accepted_values
- relationships
Custom
-- tests/assert_positive_revenue.sql
SELECT * FROM {{ ref('revenue') }}
WHERE revenue < 0Test fails если row returned.
Snapshots
Track slowly-changing dimensions:
{% snapshot users_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='id',
strategy='check',
check_cols=['email', 'plan']
)
}}
SELECT * FROM {{ source('raw', 'users') }}
{% endsnapshot %}Roles
Data engineer
Builds ingestion, manages infra.
Analytics engineer
Writes dbt models, owns transformations.
Analyst
Uses dbt models for analyses.
Blurring роли, но dbt — primary tool для analytics engineer.
На собесе
«dbt experience?» Modeling layers, tests, docs, incremental.
«dbt vs stored procedures?» dbt — git-based, testable, documented. Stored procs — legacy.
«Why dbt?» Software engineering best practices в SQL world.
Связанные темы
FAQ
Python нужен?
Нет. Базово — только SQL + Jinja templating.
Cloud vs Core?
Core для start. Cloud когда нужен scheduling UI и team scale.
Learning curve?
Пара недель для analyst с SQL background.
Тренируйте Data — откройте тренажёр с 1500+ вопросами для собесов.