EAS 657
Geophysical Inverse Theory
Robert L. Nowack
Lecture 7
The Generalized Inverse Operator
The singular value decomposition of a general linear operator can be written
![]()
where

are the eigenvectors of (AA*) associated with the non-zero eigenvalues and span R(A) = R(AA*).

are eigenvectors of (A*A) associated with the non-zero eigenvalues that span Nperp(A) = N(A*A). The singular values are,

where
are the non-zero
eigenvalues of A*A, AA*. Also,
are the non-zero
positive eigenvalues of the larger matrix
.
Now, we are interested in solving the linear matrix equation
Ax = y

We have previously classified problems Ax = y into
the following categories.
1) N(A) = {0}, Rperp(A) = {0}. For this case, A is then one-to-one and invertible and
![]()
Recall that
![]()
with
![]()
where Det Mij is the minor formed by deleting row
i and column j of A. A is in invertible
if Det A
0.
In terms of the SVD, M=N=P, and
![]()
where Up
is MxP,
is PxP, and
is PxN. Because Up and Vp now
completely span the domain and range, then
Up*Up = UpUp* = 1 Vp*Vp = VpVp* = 1
thus,
![]()
The solution is then,
![]()
(Note in general,
for ![]()
Note that for the case where N(A) = 0 and Rperp(A) = 0, instead of explicitly computing A-1, we can also factor A as
A = L U
where L is a lower triangular matrix
and U is an upper triangular
matrix
.
Then, Ax = y can be written as L Ux
= y.
Now solve
and
by back substitution. This is called Gaussian elimination. The number of algebraic operation counts is
![]()
backsubstitution ~ N2
We can derive this factorization of A with householder transformations and QR algorithms later. Note, to directly compute, A-1 is on the order of N3 operations which is larger. Thus, inverses are computed only when they are needed.
2) Purely
underdetermined systems. For this case,
a nonunique solution to Ax = y results where
N(A)
0
We know from the previous sections that if Rperp(A) = {0}, then the minimum norm solution is
XMN = A*(AA*)-1y
Recall that (ABC)* = C*B*C*, then
![]()
If U0 = {0} or Rperp(A) = {0}, then
![]()
Thus,
![]()
This is the generalized inverse for purely underdetermined case in terms of the SVD.
3a) Purely overdetermined system. For this case, the system is incompatible, but the solution is unique, where
Rperp(A)
0, N(A) = {0}
We need to find the best approximation solution. From previous sections, this can be found from the least squares solution
![]()
If V0 = {0} or N(A) = {0}, then
![]()
thus
![]()
3b) Both
N(A)
0 and Rperp(A)
0. For this case, we want to find the best
approximation – minimum norm solution since the system is incompatible and the
solution is nonunique. In this case, the
generalized inverse can directly be written in terms of the SVD as
![]()
The solution to Ax = y for all cases can then be written as
![]()
where V0 are the basis vectors spanning the
nullspace N(A). The first term on the
right is the minimum norm solution and the second term results from the
nullspace required by additional constraints to the problem. However, in many cases, if it is known that
either Rperp(A) = {0} or N(A) = {0}, it may be computationally more
efficient to compute the specialized forms for
.
But for the
specialized solutions, there may be numerical difficulties when dealing with
small eigenvalues for (A*A) and (AA*) since these have eigenvalues that are
squared
. For the earlier
example, Ax = y, then

then,

and

Thus,

and

This is the best approximation – minimum norm solution.
Resolution Matrix
The best approximation-minimum norm solution is given by
![]()
Let x by any solution to y = Ax, then
![]()
where
is called the resolution matrix and is the operator
that determines the relationship between any possible solution x and the particular solution, xg. In terms of the SVD, R can be written
![]()
Thus,
![]()
where Vp includes the vectors in Nperp(A)
and
are coefficients. This operator projects x onto the Vp space resulting in the minimum norm
solution in Nperp(A). Note
that R is an NxN matrix. If P = N, then
is an identity matrix.
If a particular row of R is “
-like” with a one on the main diagonal, then that component
of xg is called
“well resolved”. If a row of R is not “
-like”, then certain components of xg are not well resolved. Since

then,
![]()

For P=N, then R is
–like and xg
= x.

The so-called Backus-Gilbert Method finds solutions that minimize various norms of the “off diagonal” components of R in order to obtain a generalized inverse and solution xg.
As an example of computing R, consider

with

Thus,

Note that the “trace” of R (sum of diagonal elements) = P, which is the rank of A. This shows that the x3 component is well separated and uniquely resolved, but only the average of the components x1 and x2 is well resolved. The value of the Rii is often used to quantify the degree of resolution of model parameter xi.
For an
incompatible system Ax
y, then
![]()
where yg is component of the data in R(A). Then, we have
![]()
or
![]()
where Up are vectors in R(A) and
are the coefficients. N is called the relevancy matrix and is MxM. If N=I, then all the data are sensitive to
the operator A and P=M.
Next we look at the errors in the
solution. Assume the data have errors
, then
![]()
This is the part of the error in the solution mapped from
the data errors
, but doesn’t include the nonuniqueness ambiguity. Let’s look at the “covariance matrix”. Recall that the average of a random variable x is

where f(x) is called the probability density function or pdf.

The variance
measures the spread about
the mean <x>, where

The covariance of two random variables is defined as

where f(x,y) is the joint pdf of random variables x and y.
If x and y are formally independent and uncorrelated, then f(x,y) = f(x)f(y) and <xy> = <x><y>; then
= 0.
For a vector of random variables

then we can define a covariance matrix about the mean as

where
is the outer product.
2-D Example
The linear correlation coefficient between two random variables x1 and x2 is

In the extreme cases,
, then x1
and x2 are linearly
related to one another. The figure below
shows the pdf of two random variables with a positive correlation coefficient.

The figure below shows an example of the pdf of two random
variable x1 and x2 which are uncorrelated
with
= 0.

The covariance matrix of two random variables can be written

and is a symmetric matrix.
1) The
diagonal elements of C measure the “width”
of the joint pdf along
the axes.
2) The off-diagonal elements of C indicate the degree of correlation between pairs of random variables.
For the special case of Gaussian or normally distributed random variables, the mean and variances are all that are needed to specify the entire pdf.
1) 1-D case
![]()
2) N-D case
![]()
For our
case,
and
. The model covariance
matrix can be written as

where
is the average. Now
![]()
![]()
where
= Cd is the
data covariance. If all
are identically
distributed and uncorrelated independent random variables, then
![]()
where
is the individual
component of the data variances Then,
![]()
where
is called the unit
covariance matrix for the solution due to errors in the data. It amplifies data errors back to the model.
Summary
For each of
the cases, we will write down
, R, N, and Cu.
1) Rperp(A) = {0}, N(A) = {0}. This is the invertible case where ![]()
Cu = A-1A-1* = (A*A)-1
R = A-1A = I
N = AA-1 = I
2) Rperp(A) = {0}, N(A)
{0}. This is the purely undetermined case and we
want the minimum norm solution.
![]()
![]()
![]()
![]()
3a) Rperp(A)
{0}, N(A) = {0}. This is the purely overdetermined case and we
want the best approximation or least squares solution.
![]()
![]()
![]()
![]()
3b) Rperp(A)
{0}, N(A)
{0}. This is the general case and we want the best
approximation – minimum norm solution.
(This is the most
general form)
![]()
![]()
![]()
![]()
Note that

Tradeoffs Between
Variance and Resolution
If
is small but non-zero,
then
will be large. Thus, even if the data errors
are small, a small
will magnify these tremendously
and will disturb the solution.
We would
like to eliminate the effects of small singular values on the solution. We could just eliminate small
’s, but this would reduce P resulting in a poorer resolution
matrix R (or a less
-like resolution).
Thus, if we want stability in the solution due to perturbations or
errors in the data, we must sacrifice resolution or resolving power of the
various components of the solution.
Recall
![]()
and for uncorrelated data errors
, then
![]()
For our example case,

then,

The cross correlation of x1 and x2 is
resulting in perfect
correlation between x1 and
x2
Also,

Thus,
. Note that Cx
is not a measure of the absolute error in the model since the true model
can also differ by any amount in N(A).
Recall the original observation equation x1 + x2
= 1 which does not require x1
= x2 which is only the
minimum norm solution.
Thus, Cx will underestimate the true model errors if
1) There exists a non-zero N(A).
2) There are additional modeling errors.
3) If nonlinearities exist and the problem was linearized.
There is also a tradeoff between
–like resolution and the covariance of the solution resulting
from the data errors
As a note
on notation, we have used
for the singular
values
these are the singular
values of matrix A
where
where
are the eigenvalues of
A*A and AA* for i=1,P
We also use
and
for variances of
random variables x1 and x2, as well as
and
for the variances of
the data and model parameters. You need
to watch carefully for the context that these symbols are used.
The R (resolution) and N (relevance) are matrices which are just projection operators, thus,
![]()
![]()
The unit covariance operator is
. The data covariance is
.
For independent identically distributed data, then

The model errors resulting from data errors are then
![]()
For
, then
as shown before.
The Approximate
Generalized Inverse and Damped Least Squares
The generalized inverse takes care of any singularities due to zero singular values. But, in practice, we can have severe difficulties with small, but non-zero singular values, especially with numerical round-off errors.
One way
around this is to set a cutoff in the singular values and place all singular
value vectors with
into an extended null
space
. A very similar
effect can be attained without resorting to SVD. by a procedure called
“damping”. This involves solving the
extended system

This is formally overdetermined where the first M rows are
just the system Ax = y. The last N rows are the system Ix ~ 0. These last equations try to force the
solution x to zero. The constant
relatively weights the
equations Ax = y with equations that force x as close as possible to
zero. For
, one gets the original system. For
, then x goes
to zero.
Solving this extended system by standard least squares results in
![]()
![]()
and
![]()
this is the damped least squares solution. Using the SVD, then

and