Deriving the Sherman-Morrison formula

In this post, I am going to motivate an intuitive derivation of the Sherman-Morrison formula using Schur complements.


Recall the following fact (see also this Wikipedia entry on Schur complements).

Lemma 1. Let MR(n+m)×(n+m)M\in{\mathbb{R}}^{(n+m)\times (n+m)} be a block matrix with blocks M=(ABCD).\begin{aligned} M = \begin{pmatrix} A & B\\ C & D \end{pmatrix} .\end{aligned} If AA is invertible, then there exist invertible matrices P1P_1, Q1Rn×nQ_1\in{\mathbb{R}}^{n\times n} such that P1MQ1=(ABD1CD).\begin{aligned} P_1 M Q_1 = \begin{pmatrix} A - BD^{-1}C&\\& D \end{pmatrix}.\end{aligned} Similarly, if DD is invertible, then there exist invertible matrices P2P_2, Q2Rn×nQ_2\in{\mathbb{R}}^{n\times n} such that P2MQ2=(ADCA1B).\begin{aligned} P_2 M Q_2 = \begin{pmatrix} A &\\&D - CA^{-1}B \end{pmatrix}.\end{aligned}

For the purposes of this post, we won’t need to know what PiP_i and QiQ_i look like. It is enough to know that these matrices and their inverses have simple forms.

We will apply Lemma 1 to the matrix (Abc1),\begin{aligned} \begin{pmatrix} A & b\\ c^\intercal & 1 \end{pmatrix},\end{aligned} where ARn×nA\in{\mathbb{R}}^{n\times n} is invertible and bb, cRnc\in{\mathbb{R}}^n. For notational convenience, let P=P1P21P = P_1P_2^{-1} and Q=Q21Q1Q = Q_2^{-1}Q_1. We have that (Abc1)=P(A1cA1b)Q.\begin{aligned} \begin{pmatrix} A - bc^\intercal & \\ & 1 \end{pmatrix} = P \begin{pmatrix} A & \\ & 1 - c^\intercal A^{-1}b \end{pmatrix}Q.\end{aligned} This identity tells us that AbcA-bc^\intercal is invertible if and only if 1cA1b1 - c^\intercal A^{-1}b is invertible, i.e., nonzero. Finally, inverting both sides of the equation, we have that ((Abc)11)=Q1(A111cA1b)P1.\begin{aligned} \begin{pmatrix} (A-bc^\intercal)^{-1} & \\ & 1 \end{pmatrix} = Q^{-1}\begin{pmatrix} A^{-1} &\\ & \tfrac{1}{1- c^\intercal A^{-1}b} \end{pmatrix}P^{-1}.\end{aligned} In particular, the top left block of the matrix on the right is the inverse of AbcA - bc^\intercal.


Looking back at the derivation, the main idea was to use a change of variables (the PiP_i, QiQ_i matrices) to “shift” the inversion of AbcA - bc^\intercal, an a priori difficult task, to the inversion of the scalar 1cA1b1 - c^\intercal A^{-1}b, a much simpler task. This generalizes in a number of ways. For example, the same derivation applied to the (n+m)×(n+m)(n+m)\times(n+m) dimensional block matrix (ABCIm),\begin{aligned} \begin{pmatrix} A & B\\ C^\intercal & I_m \end{pmatrix},\end{aligned} where ARn×nA\in{\mathbb{R}}^{n\times n} is invertible and BB, CRn×mC\in{\mathbb{R}}^{n\times m}, gives a formula for the inverse of ABCA - BC^\intercal in terms of the inverses of AA and ICA1BI - C^\intercal A^{-1}B. This latter formula is known as the Woodbury matrix identity.

Last updated Feb 28, 2021