Markov chains на собеседовании Data Scientist
Карьерник — Duolingo для аналитиков: 10 минут в день тренируй SQL, Python, A/B, статистику, метрики и ещё 3 темы собеса. 1500+ вопросов в Telegram-боте. Бесплатно.
Содержание:
Зачем разбирать на собесе
Markov chains — основа multiple ML / stats. На собесе DS: «Markov property», «MCMC».
Markov property
Future depends только на current state, не past.
P(x_t+1 | x_t, x_{t-1}, ..., x_1) = P(x_t+1 | x_t)«Memoryless» process.
Transition matrix
P[i,j] — probability transition from state i to state j.
States: A, B, C
P = [[0.7, 0.2, 0.1], # from A
[0.3, 0.5, 0.2], # from B
[0.1, 0.4, 0.5]] # from C
Row sums = 1.Rs steps. State distribution after n steps: π_n = π_0 · P^n.
Stationary distribution
Long-run distribution. π · P = π.
import numpy as np
eigenvalues, eigenvectors = np.linalg.eig(P.T)
stationary = eigenvectors[:, np.argmin(abs(eigenvalues - 1))]
stationary = stationary / stationary.sum()PageRank — Markov chain stationary distribution на web graph.
Hidden Markov Models
State не observed directly. Observations depend на state.
Hidden states: x_t (Markov chain).
Observations: y_t (depend на x_t).Algorithms:
- Forward-backward. Compute P(x_t | y_1..T).
- Viterbi. Most likely sequence states.
- Baum-Welch (EM). Estimate parameters.
Применения: speech recognition (legacy), genomics, NLP POS tagging.
MCMC
Markov Chain Monte Carlo. Sampling from intractable distributions.
Idea. Build Markov chain, stationary distribution = target. Sample chain → samples target.
Algorithms:
- Metropolis-Hastings. Propose, accept/reject.
- Gibbs sampling. Sample one variable at a time.
- HMC (Hamiltonian Monte Carlo). Better на continuous.
- NUTS (No-U-Turn Sampler). Adaptive HMC. PyMC / Stan default.
Used в Bayesian inference когда posterior intractable.
Связанные темы
- Bayesian методы для DS
- Reinforcement learning для DS
- Time series CV и features для DS
- Stable Diffusion и GAN для DS
- Подготовка к собесу Data Scientist
FAQ
Это официальная информация?
Нет. Статья основана на классических работах (Markov 1906, Metropolis 1953, Hastings 1970).
Тренируйте Data Science — откройте тренажёр с 1500+ вопросами для собесов.