Statistical Inferencing

Statistical inference is the process of choosing a statistical model $p_\theta$ that generates observed data $X$ and inferring properties about its parameters $\theta$.

Bayesian Framework

In the Bayesian framework, $\theta$ is treated as a random variable with prior distribution $p(\theta)$. We assume data is generated according to $p(X \mid \theta)$, called the conditional, and use the posterior $p( \theta \mid X)$ to make inferences about $\theta$.

$$ p(\theta), \space p(X \mid \theta) \rightarrow \fbox{\text{data generation}} \rightarrow \{x_n\}, \space p(\theta \mid X) \rightarrow \fbox{\text{inference}} $$

Given a conditional $p(X \mid \theta)$, if the prior $p(\theta)$ is in the same distribution family as $p( \theta \mid X)$, then it is called the conjugate prior of $p(X \mid \theta)$.

Frequentist Framework

In the Frequentist framework, $\theta$ is treated as a fixed parameter. We assume data is generated according to $p(X ;\theta)$, called the likelihood, and use this to make inferences about $\theta$.

$$ p(X;\theta) \rightarrow \fbox{\text{data generation}} \rightarrow \{x_n\} , \space p(X;\theta) \rightarrow \fbox{\text{inference}} $$


Point Estimates

An estimate $\hat{\theta}$ of $\theta$ that gives a single numerical value is called a point estimate.

Bayes Estimators

Estimators which minimize the expected posterior loss (expectation over data and parameters) are called Bayes estimators.

$$ \theta_f=\argmin_{\hat{\theta}(\cdot)} \mathbb{E}_{\theta \sim p(\theta), X \sim p(X \mid \theta)} [f(\hat{\theta}(X), \theta)] $$

Maximum a Posteriori Estimate

The maximum a posteriori estimate $\theta _{MAP}$ is obtained from choosing the parameter value that maximizes the posterior density.