Bayes Nets

Joint probability distributions can be extremely large in practice, making inference intractable. Bayes Nets are graphical models which describe joint distributions using local, conditional distributions.

Breakdown. A BN consists of a directed, acyclic graph where an arc from $X_j$ to $X_k$ indicates that $X_k$is conditionally dependent on $X_j$ (i.e. $X_j$ gives information about $X_k$). Each node in a BN corresponds to a random variable $X_i$ in the joint distribution and stores the distribution of $X_i$ conditioned on its parents.

Breakdown. A BN consists of a directed, acyclic graph where an arc from $X_j$ to $X_k$ indicates that $X_k$is conditionally dependent on $X_j$ (i.e. $X_j$ gives information about $X_k$). Each node in a BN corresponds to a random variable $X_i$ in the joint distribution and stores the distribution of $X_i$ conditioned on its parents.

Since BNs are built off the premise that children are only conditionally dependent on their parents, joint distributions which have additional dependencies present cannot be represented by BNs.

Joint Distribution Factorization

Since a node is assumed to only be conditionally dependent on its parents, then the entire joint distribution is encoded in the following way.

$$ \Pr[x_1, \dots, x_n] = \prod _{i=1}^n \Pr[x_i \space | \space \text{parents($x_i$)}] $$

It is important to note that this is not the chain rule! This comes from the same intuition however, in that the joint distribution can be built up iteratively using more and more informed conditional distributions.

Explicit Derivation from Chain Rule

Causality

BNs do not necessarily encode causality, but more often correlation. In other words, it is possible for there to be an arc between $X$ and $Y$, even if they are independent.

D-Separation

Algorithm which determines conditional independence between random variables $X_i$ and $X_j$.

Shade Evidence Nodes. Shade nodes corresponding to variables $E_1, \dots, E_m$ with known values.