The Kalman Filter is an online, recursive algorithm which provides the minimum mean squared estimate of a hidden state $X_n$, given observations $Y_{1:n}$, using a linear estimate.
$$ \textbf{Hidden States } \{X_n\} \text{ with $X_0 \in \R^d$ known} \\ \textbf{Observations } \{Y_n\} \hspace{90pt} \\ \textbf{Transition Model }A\in \R^{d \times d} \hspace{93pt} \\ \textbf{Observation Model } C \in \R^{e \times d} \hspace{102pt} \\ \textbf{State Noise } \{V_n\} \overset{iid}{\sim} N(0, V) \hspace{35pt} \\ \textbf{Observation Noise } \{W_n\}\overset{iid}{\sim} N(0, W) \hspace{63pt} $$
Moreover, we assume $X_n$ and $Y_n$ are independent to $V_n$ and $W_n$ and model $X_n$ and $Y_n$ with the following equations:
$$ \textbf{State-Transition Equation } X_n = AX_{n-1} + V_n \\ \textbf{State-Observation Equation } Y_n = CX_n + W_n \hspace{16pt} $$
This modeling induces the Markov property on the states and, moreover, implies that for $n \geq 1$, the vector formed by $(X_{0:n} , Y_{1:n})$ is jointly gaussian.

To align with the content in the course, this note will only cover the algorithm in the scalar case.
Since the observations and hidden state are jointly gaussian, this algorithm leverages the fact that the MMSE estimate of $X_n$ given $Y_{1:n}$ is the same as the LLSE estimate.
$$ \mathbb{E}[X_n \mid Y_{1:n}] = \mathbb{L}[X_n \mid Y_{1:n}] $$
First, we adopt the following notation:
$$ \hat{X}{n \mid n-1} := \mathbb{L}[X_n \mid Y{1:n-1}] \hspace{15pt} \hat{X}{n \mid n} := \mathbb{L}[X_n \mid Y{1:n}] \\ \hat{\sigma}^2 {n \mid n-1} := \text{var}(X_n - \hat{X}{n \mid n-1}) \hspace{15pt} \hat{\sigma}^2_{n \mid n } := \text{var}(X_n - \hat{X}_{n \mid n}) $$
The algorithm proceeds in two steps: prediction and innovation.
$$ \textbf{Prediction } \begin{cases}\hat{X}{n \mid n-1} = A\hat{X}{n -1\mid n-1} \\ \hat{\sigma}^2 _{n \mid n-1} = A^2 \hat{\sigma} _{n-1 \mid n-1}^2 +V \end{cases} \hspace{80pt} $$
$$
\textbf{Innovation } \begin{cases} \hat{X}{n \mid n} = \hat{X}{n \mid n-1} + \frac{C\hat{\sigma^2}_{n \mid n-1}}{C^2 \hat{\sigma}^2 {n \mid n-1} + W}(Y_n -C\hat{X}{n \mid n-1}) \\ \hat{\sigma}^2 {n \mid n}= (1-\frac{C\hat{\sigma^2}{n \mid n-1}}{C^2 \hat{\sigma}^2 _{n \mid n-1} + W}C)\hat{\sigma}^2 _{n \mid n-1}\end{cases} $$
The term $\frac{C\hat{\sigma^2}{n \mid n-1}}{C^2\hat{\sigma}^2{n \mid n-1} + W}$is known as the Kalman gain, denoted by $K_n$, and represents how much we correct the predicted state $\hat{X}_{n \mid n-1}$ based on new information from the observation $\tilde{Y}_n$.
$$ \hat{X}{n \mid n} = \underbrace{\hat{X}{n \mid n-1}} {\text{Prediction}} + \underbrace{K_n }{\text{Kalman Gain}} \cdot \underbrace{ \tilde{Y}n}{\text{New Information in $Y_n$}} $$