Hidden Markov Models

A hidden Markov model is a stochastic model whereby a Markov process of discrete hidden states $\{X_n\}$ are inferred from an observable process $\{Y_n\}$.

$$ \textbf{Emission Probabilities }Q(x, y):=P(Y_n =y \mid X_n = x) \hspace{41pt} \\ \textbf{Transition Probabilities } P(x_n, x_{n+1}) := P(X_{n+1} = x_{n+1} \mid X_n = x_n) \\ \textbf{Initial Distribution } \pi _0(x) := P(X_0 = x) \hspace{70pt} $$

Kalman Filter is a Special Case of an HMM. Note that the Kalman filter is an HMM where the states are continuous with linear, gaussian emission and transition probabilities.

Kalman Filter is a Special Case of an HMM. Note that the Kalman filter is an HMM where the states are continuous with linear, gaussian emission and transition probabilities.

A trellis diagram is a convenient way for representing possible state transitions at each timestep.

$Trellis Diagram. Each level in this graph corresponds to the set of states reachable by the set of possible states at a previous timestep. The weight of each arc $d_{n}(x_n, x_{n+1})$ is related to the probability of transitioning from $x_n$ to $x_{n+1}$ multiplied by the likelihood of observing $Y_{n+1}$ given $X_{n+1} = x_{n+1}$. Its explicit definition is a convenience for the Viterbi algorithm, discussed below.$

Trellis Diagram. Each level in this graph corresponds to the set of states reachable by the set of possible states at a previous timestep. The weight of each arc $d_{n}(x_n, x_{n+1})$ is related to the probability of transitioning from $x_n$ to $x_{n+1}$ multiplied by the likelihood of observing $Y_{n+1}$ given $X_{n+1} = x_{n+1}$. Its explicit definition is a convenience for the Viterbi algorithm, discussed below.

Decoding Problem

Given a sequence of observations $Y_{0:n}$, along with $P, Q, \pi_0$, what is the most likely sequence of hidden states $X_{0:n}$?

$$ \hat{X}{0 :n} = \argmax {X{0:n}} P(X{0:n} = x_{0:n} \mid Y_{0:n} = y_{0:n}) $$

The Viterbi algorithm is an offline algorithm which solves the decoding problem by finding the shortest path in the trellis diagram rolled out to timestep $n$.

$$ \fbox{\text{$\varphi_m(x_m) = \min {x{m-1}} \varphi_{m-1}(x_{m-1}) + d_m(x_{m-1}, x_m) \text{ with } \varphi_0(x_0) = d_0(x_0)$}} \\ \fbox{\text{$\hat{X}m = \argmin {x_m} \varphi_m(x_m)$}}\hspace{153pt} \\ \text{where } \begin{cases} d_m(x{m-1}, x_m) := - \log(P(x{m-1}, x_m) Q(x_m, y_m)) \\ d_0(x_0):= -\log (\pi_0(x_0) Q(x_0, y_0)) \end{cases} $$

Derivation