In probability theory, the chain rule (also called the general product rule ) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities . The rule is useful in the study of Bayesian networks , which describe a probability distribution in terms of conditional probabilities.
Chain rule for events [ edit ]
Two events [ edit ]
The chain rule for two random events
A
{\displaystyle A}
and
B
{\displaystyle B}
says
P
(
A
∩
B
)
=
P
(
A
∣
B
)
⋅
P
(
B
)
{\displaystyle P(A\cap B)=P(A\mid B)\cdot P(B)}
.
Example [ edit ]
This rule is illustrated in the following example. Urn 1 has 1 black ball and 2 white balls and Urn 2 has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event
A
{\displaystyle A}
be choosing the first urn:
P
(
A
)
=
P
(
A
¯
)
=
1
/
2
{\displaystyle P(A)=P({\overline {A}})=1/2}
. Let event
B
{\displaystyle B}
be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is
P
(
B
|
A
)
=
2
/
3
{\displaystyle P(B|A)=2/3}
. Event
A
∩
B
{\displaystyle A\cap B}
would be their intersection: choosing the first urn and a white ball from it. The probability can be found by the chain rule for probability:
P
(
A
∩
B
)
=
P
(
B
∣
A
)
P
(
A
)
=
2
/
3
×
1
/
2
=
1
/
3
{\displaystyle \mathrm {P} (A\cap B)=\mathrm {P} (B\mid A)\mathrm {P} (A)=2/3\times 1/2=1/3}
.
More than two events [ edit ]
For more than two events
A
1
,
…
,
A
n
{\displaystyle A_{1},\ldots ,A_{n}}
the chain rule extends to the formula
P
(
A
n
∩
…
∩
A
1
)
=
P
(
A
n
|
A
n
−
1
∩
…
∩
A
1
)
⋅
P
(
A
n
−
1
∩
…
∩
A
1
)
{\displaystyle \mathrm {P} (A_{n}\cap \ldots \cap A_{1})=\mathrm {P} (A_{n}|A_{n-1}\cap \ldots \cap A_{1})\cdot \mathrm {P} (A_{n-1}\cap \ldots \cap A_{1})}
which by induction may be turned into
P
(
A
n
∩
…
∩
A
1
)
=
∏
k
=
1
n
P
(
A
k
|
⋂
j
=
1
k
−
1
A
j
)
{\displaystyle \mathrm {P} (A_{n}\cap \ldots \cap A_{1})=\prod _{k=1}^{n}\mathrm {P} \left(A_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}A_{j}\right)}
.
Example [ edit ]
With four events (
n
=
4
{\displaystyle n=4}
), the chain rule is
P
(
A
4
∩
A
3
∩
A
2
∩
A
1
)
=
P
(
A
4
∣
A
3
∩
A
2
∩
A
1
)
⋅
P
(
A
3
∩
A
2
∩
A
1
)
=
P
(
A
4
∣
A
3
∩
A
2
∩
A
1
)
⋅
P
(
A
3
∣
A
2
∩
A
1
)
⋅
P
(
A
2
∩
A
1
)
=
P
(
A
4
∣
A
3
∩
A
2
∩
A
1
)
⋅
P
(
A
3
∣
A
2
∩
A
1
)
⋅
P
(
A
2
∣
A
1
)
⋅
P
(
A
1
)
{\displaystyle {\begin{aligned}\mathrm {P} (A_{4}\cap A_{3}\cap A_{2}\cap A_{1})&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\cap A_{2}\cap A_{1})\\&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\mid A_{2}\cap A_{1})\cdot \mathrm {P} (A_{2}\cap A_{1})\\&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\mid A_{2}\cap A_{1})\cdot \mathrm {P} (A_{2}\mid A_{1})\cdot \mathrm {P} (A_{1})\end{aligned}}}
Chain rule for random variables [ edit ]
Two random variables [ edit ]
For two random variables
X
,
Y
{\displaystyle X,Y}
, to find the joint distribution, we can apply the definition of conditional probability to obtain:
P
(
X
,
Y
)
=
P
(
X
|
Y
)
⋅
P
(
Y
)
{\displaystyle \mathrm {P} (X,Y)=\mathrm {P} (X|Y)\cdot P(Y)}
More than two random variables [ edit ]
Consider an indexed collection of random variables
X
1
,
…
,
X
n
{\displaystyle X_{1},\ldots ,X_{n}}
. To find the value of this member of the joint distribution, we can apply the definition of conditional probability to obtain:
P
(
X
n
,
…
,
X
1
)
=
P
(
X
n
|
X
n
−
1
,
…
,
X
1
)
⋅
P
(
X
n
−
1
,
…
,
X
1
)
{\displaystyle \mathrm {P} (X_{n},\ldots ,X_{1})=\mathrm {P} (X_{n}|X_{n-1},\ldots ,X_{1})\cdot \mathrm {P} (X_{n-1},\ldots ,X_{1})}
Repeating this process with each final term creates the product:
P
(
⋂
k
=
1
n
X
k
)
=
∏
k
=
1
n
P
(
X
k
|
⋂
j
=
1
k
−
1
X
j
)
{\displaystyle \mathrm {P} \left(\bigcap _{k=1}^{n}X_{k}\right)=\prod _{k=1}^{n}\mathrm {P} \left(X_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}X_{j}\right)}
Example [ edit ]
With four variables (
n
=
4
{\displaystyle n=4}
), the chain rule produces this product of conditional probabilities:
P
(
X
4
,
X
3
,
X
2
,
X
1
)
=
P
(
X
4
∣
X
3
,
X
2
,
X
1
)
⋅
P
(
X
3
,
X
2
,
X
1
)
=
P
(
X
4
∣
X
3
,
X
2
,
X
1
)
⋅
P
(
X
3
∣
X
2
,
X
1
)
⋅
P
(
X
2
,
X
1
)
=
P
(
X
4
∣
X
3
,
X
2
,
X
1
)
⋅
P
(
X
3
∣
X
2
,
X
1
)
⋅
P
(
X
2
∣
X
1
)
⋅
P
(
X
1
)
{\displaystyle {\begin{aligned}\mathrm {P} (X_{4},X_{3},X_{2},X_{1})&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3},X_{2},X_{1})\\&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3}\mid X_{2},X_{1})\cdot \mathrm {P} (X_{2},X_{1})\\&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3}\mid X_{2},X_{1})\cdot \mathrm {P} (X_{2}\mid X_{1})\cdot \mathrm {P} (X_{1})\end{aligned}}}
References [ edit ]
Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning . Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8 .
Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 1-134-92862-9 .
Russell, Stuart J. ; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2 , p. 496.
"The Chain Rule of Probability" , developerWorks , Nov 3, 2012.