Joint probability distributions can be extremely large in practice, making inference intractable. Bayes Nets are graphical models which describe joint distributions using local, conditional distributions.
Breakdown. A BN consists of a directed, acyclic graph where an arc from $X_j$ to $X_k$ indicates that $X_k$is conditionally dependent on $X_j$ (i.e. $X_j$ gives information about $X_k$). Each node in a BN corresponds to a random variable $X_i$ in the joint distribution and stores the distribution of $X_i$ conditioned on its parents.
Since BNs are built off the premise that children are only conditionally dependent on their parents, joint distributions which have additional dependencies present cannot be represented by BNs.
Since a node is assumed to only be conditionally dependent on its parents, then the entire joint distribution is encoded in the following way.
$$ \Pr[x_1, \dots, x_n] = \prod _{i=1}^n \Pr[x_i \space | \space \text{parents($x_i$)}] $$
It is important to note that this is not the chain rule! This comes from the same intuition however, in that the joint distribution can be built up iteratively using more and more informed conditional distributions.
BNs do not necessarily encode causality, but more often correlation. In other words, it is possible for there to be an arc between $X$ and $Y$, even if they are independent.
Algorithm which determines conditional independence between random variables $X_i$ and $X_j$.