Schur complement

In linear algebra and the theory of matrices, the Schur complement of a block matrix is defined as follows.

Suppose A, B, C, D are respectively p × p, p × q, q × p, and q × q matrices, and D is invertible. Let

M=\left[{\begin{matrix}A&B\\C&D\end{matrix}}\right]

so that M is a (p + q) × (p + q) matrix.

Then the Schur complement of the block D of the matrix M is the p × p matrix defined by

M/D:=A-BD^{-1}C\,

and the Schur complement of the block A of the matrix M is the q × q matrix defined by

M/A:=D-CA^{-1}B.

In the case that A or D is singular, substituting a generalized inverse for the inverses on M/A and M/D yields the generalized Schur complement.

The Schur complement is named after Issai Schur who used it to prove Schur's lemma, although it had been used previously.^[1] Emilie Haynsworth was the first to call it the Schur complement.^[2] The Schur complement is a key tool in the fields of numerical analysis, statistics and matrix analysis.

Background[edit]

The Schur complement arises as the result of performing a block Gaussian elimination by multiplying the matrix M from the right with a block lower triangular matrix

L={\begin{bmatrix}I_{p}&0\\-D^{-1}C&I_{q}\end{bmatrix}}.

Here I_p denotes a p×p identity matrix. After multiplication with the matrix L the Schur complement appears in the upper p×p block. The product matrix is

{\begin{aligned}ML&={\begin{bmatrix}A&B\\C&D\end{bmatrix}}{\begin{bmatrix}I_{p}&0\\-D^{-1}C&I_{q}\end{bmatrix}}={\begin{bmatrix}A-BD^{-1}C&B\\0&D\end{bmatrix}}\\[4pt]&={\begin{bmatrix}I_{p}&BD^{-1}\\0&I_{q}\end{bmatrix}}{\begin{bmatrix}A-BD^{-1}C&0\\0&D\end{bmatrix}}.\end{aligned}}

This is analogous to an LDU decomposition. That is, we have shown that

{\begin{aligned}{\begin{bmatrix}A&B\\C&D\end{bmatrix}}&={\begin{bmatrix}I_{p}&BD^{-1}\\0&I_{q}\end{bmatrix}}{\begin{bmatrix}A-BD^{-1}C&0\\0&D\end{bmatrix}}{\begin{bmatrix}I_{p}&0\\D^{-1}C&I_{q}\end{bmatrix}},\end{aligned}}

and inverse of M thus may be expressed involving D⁻¹ and the inverse of Schur's complement (if it exists) only as

{\begin{aligned}&{\begin{bmatrix}A&B\\C&D\end{bmatrix}}^{-1}={\begin{bmatrix}I_{p}&0\\-D^{-1}C&I_{q}\end{bmatrix}}{\begin{bmatrix}\left(A-BD^{-1}C\right)^{-1}&0\\0&D^{-1}\end{bmatrix}}{\begin{bmatrix}I_{p}&-BD^{-1}\\0&I_{q}\end{bmatrix}}\\[4pt]={}&{\begin{bmatrix}\left(A-BD^{-1}C\right)^{-1}&-\left(A-BD^{-1}C\right)^{-1}BD^{-1}\\-D^{-1}C\left(A-BD^{-1}C\right)^{-1}&D^{-1}+D^{-1}C\left(A-BD^{-1}C\right)^{-1}BD^{-1}\end{bmatrix}}\\[4pt]={}&{\begin{bmatrix}\left(A-BD^{-1}C\right)^{-1}&-\left(A-BD^{-1}C\right)^{-1}BD^{-1}\\-D^{-1}C\left(A-BD^{-1}C\right)^{-1}&\left(D-CA^{-1}B\right)^{-1}\end{bmatrix}}\\[4pt]={}&{\begin{bmatrix}\left(M/D\right)^{-1}&-\left(M/D\right)^{-1}BD^{-1}\\-D^{-1}C\left(M/D\right)^{-1}&\left(M/A\right)^{-1}\end{bmatrix}}.\end{aligned}}

C.f. matrix inversion lemma which illustrates relationships between the above and the equivalent derivation with the roles of A and D interchanged.

Properties[edit]

If M is a positive-definite symmetric matrix, then so is the Schur complement of D in M.
If p and q are both 1 (i.e., A, B, C and D are all scalars), we get the familiar formula for the inverse of a 2-by-2 matrix:
$M^{-1}={\frac {1}{AD-BC}}\left[{\begin{matrix}D&-B\\-C&A\end{matrix}}\right]$

provided that AD − BC is non-zero.

In general, if A is invertible, then
$M^{-1}={\begin{bmatrix}A^{-1}+A^{-1}B(M/A)^{-1}CA^{-1}&-A^{-1}B(M/A)^{-1}\\-(M/A)^{-1}CA^{-1}&(M/A)^{-1}\end{bmatrix}}$

whenever this inverse exists.

The determinant of M is also clearly seen to be given by
$\det(M)=\det(D)\det \left(A-BD^{-1}C\right)$

which generalizes the determinant formula for 2 × 2 matrices.

(Guttman rank additivity formula) The rank of M is given by
$\operatorname {rank} (M)=\operatorname {rank} (D)+\operatorname {rank} \left(A-BD^{-1}C\right)$
(Haynsworth inertia additivity formula) The inertia of the block matrix M is equal to the inertia of A plus the inertia of M/A.

Application to solving linear equations[edit]

The Schur complement arises naturally in solving a system of linear equations such as

{\begin{aligned}Ax+By&=a\\Cx+Dy&=b\end{aligned}}

where x, a are p-dimensional column vectors, y, b are q-dimensional column vectors, and A, B, C, D are as above. Multiplying the bottom equation by ${\textstyle BD^{-1}}$ and then subtracting from the top equation one obtains

\left(A-BD^{-1}C\right)x=a-BD^{-1}b.

Thus if one can invert D as well as the Schur complement of D, one can solve for x, and then by using the equation ${\textstyle Cx+Dy=b}$ one can solve for y. This reduces the problem of inverting a ${\textstyle (p+q)\times (p+q)}$ matrix to that of inverting a p × p matrix and a q × q matrix. In practice, one needs D to be well-conditioned in order for this algorithm to be numerically accurate.

In electrical engineering this is often referred to as node elimination or Kron reduction.

Applications to probability theory and statistics[edit]

Suppose the random column vectors X, Y live in Rⁿ and R^m respectively, and the vector (X, Y) in R^{n + m} has a multivariate normal distribution whose covariance is the symmetric positive-definite matrix

\Sigma =\left[{\begin{matrix}A&B\\B^{\mathsf {T}}&C\end{matrix}}\right],

where ${\textstyle A\in \mathbb {R} ^{n\times n}}$ is the covariance matrix of X, ${\textstyle C\in \mathbb {R} ^{m\times m}}$ is the covariance matrix of Y and ${\textstyle B\in \mathbb {R} ^{n\times m}}$ is the covariance matrix between X and Y.

Then the conditional covariance of X given Y is the Schur complement of C in ${\textstyle \Sigma }$ ^[3]:

{\begin{aligned}\operatorname {Cov} (X\mid Y)&=A-BC^{-1}B^{\mathsf {T}}\\\operatorname {E} (X\mid Y)&=\operatorname {E} (X)+BC^{-1}(Y-\operatorname {E} (Y))\end{aligned}}

If we take the matrix $\Sigma$ above to be, not a covariance of a random vector, but a sample covariance, then it may have a Wishart distribution. In that case, the Schur complement of C in $\Sigma$ also has a Wishart distribution.^{[citation needed]}