Chain rule (probability)

In probability theory, the chain rule (also called the general product rule[1][2]) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities. The rule is useful in the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.

Chain rule for events

Two events

The chain rule for two random events ${\displaystyle A}$ and ${\displaystyle B}$ says

${\displaystyle P(A\cap B)=P(A\mid B)\cdot P(B)}$.

Example

This rule is illustrated in the following example. Urn 1 has 1 black ball and 2 white balls and Urn 2 has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event ${\displaystyle A}$ be choosing the first urn: ${\displaystyle P(A)=P({\overline {A}})=1/2}$. Let event ${\displaystyle B}$ be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is ${\displaystyle P(B|A)=2/3}$. Event ${\displaystyle A\cap B}$ would be their intersection: choosing the first urn and a white ball from it. The probability can be found by the chain rule for probability:

${\displaystyle \mathrm {P} (A\cap B)=\mathrm {P} (B\mid A)\mathrm {P} (A)=2/3\times 1/2=1/3}$.

More than two events

For more than two events ${\displaystyle A_{1},\ldots ,A_{n}}$ the chain rule extends to the formula

${\displaystyle \mathrm {P} (A_{n}\cap \ldots \cap A_{1})=\mathrm {P} (A_{n}|A_{n-1}\cap \ldots \cap A_{1})\cdot \mathrm {P} (A_{n-1}\cap \ldots \cap A_{1})}$

which by induction may be turned into

${\displaystyle \mathrm {P} (A_{n}\cap \ldots \cap A_{1})=\prod _{k=1}^{n}\mathrm {P} \left(A_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}A_{j}\right)}$.

Example

With four events (${\displaystyle n=4}$), the chain rule is

{\displaystyle {\begin{aligned}\mathrm {P} (A_{4}\cap A_{3}\cap A_{2}\cap A_{1})&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\cap A_{2}\cap A_{1})\\&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\mid A_{2}\cap A_{1})\cdot \mathrm {P} (A_{2}\cap A_{1})\\&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\mid A_{2}\cap A_{1})\cdot \mathrm {P} (A_{2}\mid A_{1})\cdot \mathrm {P} (A_{1})\end{aligned}}}

Chain rule for random variables

Two random variables

For two random variables ${\displaystyle X,Y}$, to find the joint distribution, we can apply the definition of conditional probability to obtain:

${\displaystyle \mathrm {P} (X,Y)=\mathrm {P} (X|Y)\cdot P(Y)}$

More than two random variables

Consider an indexed collection of random variables ${\displaystyle X_{1},\ldots ,X_{n}}$. To find the value of this member of the joint distribution, we can apply the definition of conditional probability to obtain:

${\displaystyle \mathrm {P} (X_{n},\ldots ,X_{1})=\mathrm {P} (X_{n}|X_{n-1},\ldots ,X_{1})\cdot \mathrm {P} (X_{n-1},\ldots ,X_{1})}$

Repeating this process with each final term creates the product:

${\displaystyle \mathrm {P} \left(\bigcap _{k=1}^{n}X_{k}\right)=\prod _{k=1}^{n}\mathrm {P} \left(X_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}X_{j}\right)}$

Example

With four variables (${\displaystyle n=4}$), the chain rule produces this product of conditional probabilities:

{\displaystyle {\begin{aligned}\mathrm {P} (X_{4},X_{3},X_{2},X_{1})&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3},X_{2},X_{1})\\&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3}\mid X_{2},X_{1})\cdot \mathrm {P} (X_{2},X_{1})\\&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3}\mid X_{2},X_{1})\cdot \mathrm {P} (X_{2}\mid X_{1})\cdot \mathrm {P} (X_{1})\end{aligned}}}

References

• Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning. Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8.
• Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 1-134-92862-9.
• Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2, p. 496.
• "The Chain Rule of Probability", developerWorks, Nov 3, 2012.