23 апреля 2026 г.·5 мин чтения

Dimensional modeling для аналитика

Q: Analyst должен знать?

Middle+ — yes. Especially working с DWH.

Q: Обязательно Kimball?

Standard. Other approaches niches.

Q: Modern stack меняет?

Some flat-table trends. Fundamentals same.

Проверь себя · 1/3разбор после ответа

В таблице events(user_id, created_at) поле created_at имеет тип отметки времени с долями секунд. Как корректно посчитать количество событий по дням?

Зачем это знать

Data warehouse design — если вы работаете с DWH, это необходимо. Kimball dimensional modeling — industry standard.

На собесах middle+ questions о star schema, dim / fact, SCD — expected.

Foundation: Kimball approach

Ralph Kimball — data warehouse thought leader.

Idea: analytics-optimized schema, business-user-friendly.

Contrast OLTP (operational)

Normalized
Write-optimized
Entities normalized tables

OLAP (analytics)

Denormalized
Read-optimized
Fact + dimension tables

Dimensional modeling — for OLAP.

Fact tables

Core business events / measurements.

Properties

Many rows (millions-billions)
Few columns (numeric measures + foreign keys)
Grain — what each row represents

Example: Sales fact

fact_sales
- date_key (FK)
- product_key (FK)
- customer_key (FK)
- store_key (FK)
- quantity
- price
- revenue

Each row: one sale.

Types

Transactional: single event per row
Periodic snapshot: state at time (e.g., daily inventory)
Accumulating snapshot: milestones (e.g., order stages)

Dimension tables

Descriptive context.

Properties

Fewer rows (< 1M usually)
Many columns (descriptive attributes)
Primary key natural or surrogate

Example: Product dim

dim_product
- product_key (PK)
- product_id (natural)
- name
- category
- brand
- price
- color

Each row: one product.

Star schema

Fact table center, dim tables around.

        dim_product
             |
dim_date — fact_sales — dim_customer
             |
         dim_store

Simple joins (fact ↔ dim). Analytics-friendly.

Snowflake schema

Like star, but dim tables normalized (sub-dims).

dim_product → dim_category → dim_supercategory

Saves space, but more JOINs. Often less useful на modern scale.

Grain

Critical concept: what single row represents.

Examples

One sale transaction
One user-day
One order-line

Matters

Ambiguous grain → double-counting, wrong metrics.

Document clearly.

Foreign keys

Fact → dim через keys.

Surrogate keys

Artificial ID (integer). Not business key.

Reasons:

Performance (integer JOIN fast)
Handle history (SCD)
Simpler schema

Natural keys

Business identifier (product_sku).

Problems:

Can change
Hard standardize
Slower JOIN

Warehouse convention: surrogate keys mostly.

Slowly Changing Dimensions (SCD)

Dim attributes change. How track?

Type 1: Overwrite

New value replaces old. No history.

Simple. Lose history.

dim_customer.city: 'Moscow' → 'SPb'

Type 2: Add row

New row с new values. Old row valid_until timestamp.

customer_key | customer_id | city   | valid_from | valid_to
1            | 123         | Moscow | 2020-01    | 2023-06
2            | 123         | SPb    | 2023-06    | 9999-12

History preserved. Common choice.

Type 3: Add column

Current + previous.

city_current | city_previous

Limited history.

Type 4: History table

Separate table tracks changes.

Type 6: Hybrid

Combines multiple.

Most common: Type 1 (для unimportant) + Type 2 (important historical).

Degenerate dimensions

Dim attribute stored в fact table (no separate dim).

Example: order_id в fact_sales. No separate dim_order (no other attributes).

Saves JOIN.

Junk dimensions

Combine unrelated small dims:

dim_junk: (payment_method, shipping_method, promo_code_flag)

Instead 3 separate dims.

Готовься к собесу аналитика как в Duolingo

10 минут в день — SQL, Python, A/B, метрики. 1700+ вопросов в Telegram

Открыть Карьерник в Telegram

Role-playing dimensions

Same dim used multiple contexts.

Example: dim_date как «order_date» и «shipment_date». Same dim, different FK.

Bridge tables

Many-to-many между dim и fact.

Example: Sale → Multiple salespeople.

bridge_sale_salesperson: (sale_key, salesperson_key, commission_pct)

Example: e-commerce

Facts

fact_orders
fact_order_items
fact_web_sessions
fact_inventory (snapshot)

Dimensions

dim_date
dim_customer (SCD Type 2)
dim_product
dim_store
dim_campaign

Star queries

«Revenue по category месячно»:

SELECT
    d.month,
    p.category,
    SUM(f.revenue) AS total
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY d.month, p.category;

Clean, fast.

In practice

Tools

dbt — build dimensional model в SQL
Informatica / other ETL — legacy
Fivetran / Airbyte — extraction + transform

Stages

Raw: source system replica
Staged: cleaned, typed
Core: dimensional model (facts + dims)
Marts: specific use cases

dbt example

-- models/core/dim_customer.sql
SELECT
    ROW_NUMBER() OVER (ORDER BY customer_id) AS customer_key,
    customer_id,
    first_name,
    last_name,
    city,
    signup_date,
    CURRENT_DATE AS valid_from
FROM {{ ref('stg_customers') }};

Для analyst

Benefits

Simple JOINs
Fast aggregations
Business-friendly
Reusable

Query

SELECT dim.*, fact.* FROM fact_x
JOIN dim_y ON fact_x.dim_y_key = dim_y.dim_y_key;

Predictable.

SCD aware

When querying history — filter valid_to (Type 2).

Альтернативы

Data vault

Alternative methodology (Dan Linstedt). Hub / link / satellite.

Less common. Enterprise sometimes.

Flat tables

Modern approach: wide denormalized tables. Ok при cheap storage.

Trade-off: less flexible queries.

Activity schema

Event-based. Each row = event. Simpler.

Common mistakes

Wrong grain

Ambiguous counts.

No surrogate keys

Production issues when natural changes.

Too many joins

If query requires 10 joins — questionable design.

Ignoring history

No SCD → cannot answer historical queries.

Over-engineering

Simple business → over-complex schema.

На собесе

«Star schema что это?»

Central fact, surrounding dims. Analytics-friendly.

«Star vs snowflake?»

Star flat. Snowflake normalized dims.

«SCD types?»

1: overwrite. 2: history rows. 3: column add.

«Fact vs dim?»

Fact — measurements, events. Dim — descriptive attributes.

«Grain?»

What single row represents в fact table.

Books

Kimball «Data Warehouse Toolkit»
«Star Schema Complete Reference»

Связанные темы

FAQ

Analyst должен знать?

Middle+ — yes. Especially working с DWH.

Обязательно Kimball?

Standard. Other approaches niches.

Modern stack меняет?

Some flat-table trends. Fundamentals same.

Dimensional modeling для аналитика

Зачем это знать

Foundation: Kimball approach

Contrast OLTP (operational)

OLAP (analytics)

Fact tables

Properties

Example: Sales fact

Types

Dimension tables

Properties

Example: Product dim

Star schema

Snowflake schema

Grain

Examples

Matters

Foreign keys

Surrogate keys

Natural keys

Slowly Changing Dimensions (SCD)

Type 1: Overwrite

Type 2: Add row

Type 3: Add column

Type 4: History table

Type 6: Hybrid

Degenerate dimensions

Junk dimensions

Role-playing dimensions

Bridge tables

Example: e-commerce

Facts

Dimensions

Star queries

In practice

Tools

Stages

dbt example

Для analyst

Benefits

Query

SCD aware

Альтернативы

Data vault

Flat tables

Activity schema

Common mistakes

Wrong grain

No surrogate keys

Too many joins

Ignoring history

Over-engineering

На собесе

Books

Связанные темы

FAQ

Analyst должен знать?

Обязательно Kimball?

Modern stack меняет?

Ещё по теме