Alex L. Wang – Deriving the Sherman-Morrison formula

In this post, I am going to motivate an intuitive derivation of the Sherman-Morrison formula using Schur complements.

Recall the following fact (see also this Wikipedia entry on Schur complements).

Lemma 1. Let $M\in{\mathbb{R}}^{(n+m)\times (n+m)}$ be a block matrix with blocks $\begin{aligned} M = \begin{pmatrix} A & B\\ C & D \end{pmatrix} .\end{aligned}$ If $A$ is invertible, then there exist invertible matrices $P_1$ , $Q_1\in{\mathbb{R}}^{n\times n}$ such that $\begin{aligned} P_1 M Q_1 = \begin{pmatrix} A - BD^{-1}C&\\& D \end{pmatrix}.\end{aligned}$ Similarly, if $D$ is invertible, then there exist invertible matrices $P_2$ , $Q_2\in{\mathbb{R}}^{n\times n}$ such that $\begin{aligned} P_2 M Q_2 = \begin{pmatrix} A &\\&D - CA^{-1}B \end{pmatrix}.\end{aligned}$

For the purposes of this post, we won’t need to know what $P_i$ and $Q_i$ look like. It is enough to know that these matrices and their inverses have simple forms.

We will apply Lemma 1 to the matrix $\begin{aligned} \begin{pmatrix} A & b\\ c^\intercal & 1 \end{pmatrix},\end{aligned}$ where $A\in{\mathbb{R}}^{n\times n}$ is invertible and $b$ , $c\in{\mathbb{R}}^n$ . For notational convenience, let $P = P_1P_2^{-1}$ and $Q = Q_2^{-1}Q_1$ . We have that $\begin{aligned} \begin{pmatrix} A - bc^\intercal & \\ & 1 \end{pmatrix} = P \begin{pmatrix} A & \\ & 1 - c^\intercal A^{-1}b \end{pmatrix}Q.\end{aligned}$ This identity tells us that $A-bc^\intercal$ is invertible if and only if $1 - c^\intercal A^{-1}b$ is invertible, i.e., nonzero. Finally, inverting both sides of the equation, we have that $\begin{aligned} \begin{pmatrix} (A-bc^\intercal)^{-1} & \\ & 1 \end{pmatrix} = Q^{-1}\begin{pmatrix} A^{-1} &\\ & \tfrac{1}{1- c^\intercal A^{-1}b} \end{pmatrix}P^{-1}.\end{aligned}$ In particular, the top left block of the matrix on the right is the inverse of $A - bc^\intercal$ .

Looking back at the derivation, the main idea was to use a change of variables (the $P_i$ , $Q_i$ matrices) to “shift” the inversion of $A - bc^\intercal$ , an a priori difficult task, to the inversion of the scalar $1 - c^\intercal A^{-1}b$ , a much simpler task. This generalizes in a number of ways. For example, the same derivation applied to the $(n+m)\times(n+m)$ dimensional block matrix $\begin{aligned} \begin{pmatrix} A & B\\ C^\intercal & I_m \end{pmatrix},\end{aligned}$ where $A\in{\mathbb{R}}^{n\times n}$ is invertible and $B$ , $C\in{\mathbb{R}}^{n\times m}$ , gives a formula for the inverse of $A - BC^\intercal$ in terms of the inverses of $A$ and $I - C^\intercal A^{-1}B$ . This latter formula is known as the Woodbury matrix identity.