The law of large numbers states that as the sample size $n \rightarrow \infty$, the sample mean $\hat{\mu}=\frac{1}{n}\sum_{i=1} ^n X_i = \frac{1}{n}S_n$ converges to the true mean $\mu$ of the underlying distribution.

$$ \Pr[|\frac{1}{n}S_n-\mu| \geq \varepsilon] \rightarrow 0 \text{ as } n \rightarrow \infty $$

Proof

Estimators

An estimator $\hat{\theta}$ is a function which estimates a quantity, like a random variable $\theta$, based on previous observations. The bias of estimator is the difference between the expected value of the estimator $\hat{\theta}$ and the true value $\theta$ of the quantity being estimated. It is often approximated using confidence intervals.

$$ \text{Bias}(\hat{\theta}, \theta) := \mathbb{E}[\hat{\theta} - \theta] $$

Minimum Mean Square Estimate

Estimate which uses the conditioned expectation of a random variable $Y$.

$$ \argmin {\hat{y}}\mathbb{E}[(Y-\hat{y})^2 \space | \space X_1= x_1, \space \dots , \space X_n = x_n] \\ ⇒y{\text{MMSE}}(x_1, \space \dots , \space x_n)= \mathbb{E}[Y \space | \space X_1 = x_1, \space \dots , \space X_n=x_n] $$

Proof

A significant drawback of this estimate is that finding a closed form for $\mathbb{E}[X \space | X_1, \space \dots, \space X_n]$ is generally very difficult.

Linear Least Squares Estimate

Estimate which uses a linear model to estimate a random variable $Y$.

$$ \mathcal{L}(X) =mX +b \hspace{25pt}\argmin _{\mathcal{L}}\mathbb{E}[(Y-\mathcal{L}(X))^2 ] \\ ⇒\mathcal{L}(X)=\frac{\text{Cov}(X, Y)}{\text{Var}(X)}(X-\mathbb{E}[X])+\mathbb{E}[Y] $$

Proof

The nice thing about this estimate is that only 2nd order statistics about the joint distribution of the two variables are required (i.e. computing $\text{Cov}(X,Y)$ is much easier than computing $\mathbb{E}[Y \space | \space X]$).

The fraction of the variance of $Y$ (which is just the error of the minimum mean square estimate) that is unexplained by the estimator is given by the following.