Как писать commit messages аналитику

Проверь себя · 1/3разбор после ответа
Вы сортируете товары по величине скидки discount по убыванию. Поле discount может быть NULL (скидки нет). Чтобы товары без скидки всегда оказывались внизу независимо от настроек СУБД, какой вариант сортировки выбрать?

Зачем это знать

Analyst 2026 uses git. dbt projects, shared notebooks, version-controlled queries. Poor git habits → team friction, lost work.

На собесах могут спросить basic git. Quality commit messages — professional signal.

Git basics

Repository

Project с tracked files.

Commit

Snapshot changes.

Branch

Parallel line работы.

Merge

Combine branches.

Pull / Push

Sync remote.

Daily commands

# Status
git status

# Add / stage
git add file.sql
git add .  # all changes

# Commit
git commit -m "message"

# Push
git push origin branch-name

# Pull
git pull origin main

# Branch
git checkout -b new-feature
git checkout existing-branch

# Merge / update
git merge main

Commit messages

Structure

<type>: <short summary>

<optional body explaining what и why>

<optional footer: references, breaking changes>

Types

  • feat: new feature
  • fix: bug fix
  • refactor: code change без feature / bug
  • docs: documentation
  • test: adding tests
  • chore: maintenance
  • style: formatting

Good examples

feat(dashboard): add revenue trend chart к executive dashboard

fix(retention_model): correct cohort offset calculation

refactor(dim_customer): simplify SCD logic

Bad examples

update
fix things
wip
asdf
commit

Too vague. Later no idea what changed.

Granularity

Per logical change

Each commit = one thought.

Good:
- feat(metrics): add new_visitor_rate metric
- fix(metrics): handle null session duration
- docs(metrics): document new_visitor_rate definition

Bad:
- refactor everything

Atomic

Можно revert individually.

Frequency

Commit often. Push regularly.

Don't commit hundreds files once.

Branches

Naming

Convention:

feature/add-revenue-dashboard
fix/cohort-offset-bug
refactor/metric-definitions
analysis/q2-retention-deep-dive

Lifecycle

  1. Create from main
  2. Work, commit
  3. Push
  4. Pull request / merge request
  5. Review
  6. Merge
  7. Delete branch

Don't

  • Work directly на main
  • Long-lived feature branches (stale)
  • Unmerged work local weeks

Pull requests

Title

Clear summary.

Description

  • What changed
  • Why
  • Related ticket / issue
  • Screenshots (if UI)

Review

Ask peers check. Analyst reviews other analyst SQL / dbt.

Pre-submit

  • Self-review diff
  • Run tests
  • Clean up

dbt git workflow

Typical

# New model
git checkout -b feat/add-weekly-revenue-model

# Edit dbt model
# Run tests
dbt run --select weekly_revenue
dbt test --select weekly_revenue

# Commit
git add models/marts/weekly_revenue.sql
git commit -m "feat(marts): add weekly_revenue aggregation"

# Push
git push origin feat/add-weekly-revenue-model

# Open PR для review

Jupyter notebooks git

Challenges

  • Binary-ish (JSON с outputs)
  • Diff painful
  • Outputs не important

Solutions

  • Clear outputs before commit
  • nbstripout tool
  • Text-based alternative (Jupytext)

Pattern

# Before commit
jupyter nbconvert --clear-output notebook.ipynb
git add notebook.ipynb
git commit -m "analysis: Q2 retention deep-dive"

Merge conflicts

Occur when

Two branches modify same line.

Resolve

  1. Pull latest main
  2. Merge / rebase
  3. Resolve conflict markers <<<<, ====, >>>>
  4. Test
  5. Commit

Avoid

  • Pull frequently
  • Small branches
  • Team communication

Reverting

Undo commits

git revert <commit-hash>  # Creates new commit undoing
git reset --hard <commit-hash>  # Rewinds (destructive)

Undo changes не commited

git checkout -- file.sql  # Reverts file
git stash  # Save in-progress
Готовься к собесу аналитика как в Duolingo
10 минут в день — SQL, Python, A/B, метрики. 1700+ вопросов в Telegram
Открыть Карьерник в Telegram

Tools

CLI

Built-in. Main interface.

GUI

  • VS Code integrated
  • GitKraken
  • SourceTree
  • GitHub Desktop

Help for visual thinkers.

Platforms

  • GitHub
  • GitLab
  • Bitbucket

Usually tied к company choice.

Security

Secrets

Never commit:

  • Passwords
  • API keys
  • Customer data

If accidentally → rotate key, git history cleanup.

.gitignore

Exclude:

.env
*.pyc
__pycache__
.DS_Store
credentials.json
notebook_outputs/

Collaboration

1. Clone

git clone <url>

2. Sync

git pull  # Before starting work

3. Work

Feature branch. Commit often.

4. Share

git push

5. Review

PR. Discuss. Iterate.

6. Merge

Approved → merge.

7. Cleanup

Delete branch.

dbt + git + CI

Modern setup:

  • dbt models в git
  • PR triggers CI
  • CI runs dbt tests
  • Block merge если fail
  • Merge → deploy

Best practice reliable pipelines.

Common mistakes

Not committing часто enough

Lose work if laptop fails.

Vague messages

«fix» × 20. Useless.

Force push

Overwrites others. Don't на shared branches.

Committing generated files

Output CSVs, cache files. Bloats repo.

Not pulling

Conflicts multiply.

Analyst-specific

Query versioning

Keep SQL files в git. Не just в BI tool.

Analysis notebooks

Commit даже EDA. Future reference.

Tracking changes

«Why metric changed definition?». Git log — answer.

Time travel

git checkout old-commit — ссылка на archival state.

Workflow examples

Simple analyst

  • Main branch: working queries
  • Ad-hoc: feature branches
  • PR to main

dbt team

  • Dev environment (personal)
  • Staging (PR builds)
  • Production (main merge)

Notebooks

  • Per-analyst directory
  • Explore freely
  • Promote к shared когда done

Learning

Resources

  • GitHub tutorials
  • Atlassian Git Tutorials
  • Oh Shit, Git!?! (fun troubleshooting)
  • Pro Git book (free online)

Practice

  • Personal projects
  • Contribute к OSS
  • Learn while working

На собесе

«Git experience?»

Show:

  • Branching strategy
  • PR process
  • Commit style

«How handle conflicts?»

Resolve: understand, edit, test, commit.

«.gitignore?»

Exclude secrets, generated files.

Basic fluency expected.

Связанные темы

FAQ

Git Flow vs GitHub Flow?

GitHub Flow simpler (main + feature branches). GitFlow more formal (develop, release, hotfix).

Rebase vs merge?

Merge preserves history. Rebase cleaner. Team preference.

Large files?

Git LFS (Large File Storage). For datasets.