EAS 657
Geophysical Inverse Theory
Robert L. Nowack
Lecture 10
In the Bayesian method assuming Gaussian statistics, we want to find x such that the pdf of the model given the data is maximized. For a linear forward problem,
![]()
This pdf is maximized if the combined exponent term is minimized. Thus,
![]()
This assumes that the data and model residuals y – Ax and (x-x0) follow Gaussian distributions. In the maximum likelihood method, only the first term is minimized and thus Gaussian statistics is associated with weighted least squares. In Bayesian statistics, the two terms are simultaneously minimized.
We will first investigate the statistics involved with the maximum likelihood method and later with Bayesian inversion.
Ex) Find the errors in the model from errors in the data where
![]()
The covariance matrix for errors in the model from the data errors is

where < > denotes the average
and
is an outer
product. If the yi’s are independent and
identically distributed (i.i.d.), then
![]()
and
![]()
where
is the unit covariance
operator Cu. For Case 3a) described
earlier where N(A) = {0}, then
![]()
Ex) Ax
= y resulting from fitting a
line to points as y = a x + b or

where b is the intercept and a is the slope

The errors in (b,a) resulting from the data errors are given by
where
the model parameters are ![]()
and


and
.
Thus, the magnitude of the errors in a and b are dependent on <x2>. If the range of x is small, then <x2>
~ <x><x>, resulting in a large magnification of the errors. There is also a cross term
and the correlation
coefficient is
![]()
This gives
![]()
If <x> = 0,
then a and b are uncorrelated. This results if the data is centered about x = 0.
If <x>
0, and if <x2> is small, then there
will be a strong negative correlation between slope and intercept (b,a). Thus, it is important to center the data
about x = 0 when performing a line
fit.
Ex) For the averaging of M numbers yi with
, then

In the maximum likelihood method, we want to find <y> that maximizes this pdf. We can also minimize the exponent
![]()
and this minimizes the sum of the
squares. For two measurements, y1 and y2, this is
![]()
or
![]()
which is the average of the two
values. The values y1 and y2
are thus equally weighted. Suppose the
error in y1 is
1 cm and the error is y2
is
.1 cm. We can use the reciprocal
of data covariance matrix as the weights where
![]()
Then,
![]()
and
![]()
Now, we have weighted the measurement y2 by 100 times over y1 in the average.
In Bayesian inversion using Gaussian statistics, we would like to simultaneously minimize both

Minimizing the second term minimizes the weighted model norm.
Ex) Assume the model parameters are

For example, chose the prior values and uncertainties as
![]()
![]()
![]()
Note that these do not have the same physical units or
dimensions. We will use
to equalize the
weights. Let

The minimization of the model norm would give
Min: 
Thus, we would minimize
![]()
Ex) Assume a layered model where some layers are thin and some are thick. For a minimum model norm, then
Minimize: ![]()
or by discrete analogy,
Minimize: ![]()

Note that here
is an implicit weighting
function. The larger
is the more weight. A similar implicit weighting by block size or
volume occurs in higher dimensions.
We want to generalize the results when
![]()
For the Bayesian or stochastic inversion assuming Gaussian statistics, we want to find the minimum of
![]()
Thus, given the observed data y and the data covariance matrix Cd, we also need the prior information,

We will first investigate the problem graphically. Since the prior model distribution is assumed to be independent of the actual data values, then the joint pdf of y and x is
![]()

We have yet to introduce our knowledge of the forward problem (the relation between data and model). A linear relation, Ax – y = 0 defines a hyperplane in (x,y) space. In general, we may have a nonlinear relation between x and y or g(x,y) = 0 or y = g(x).
We now want to find (xest,ypre) such that
1) g(xest,ypre) = 0
2) maximize the joint pdf f(x,y)

If g(x,y) = 0 is satisfied only for one point such that f(x,y) is maximized, then this is a unique solution. However, we can have several different situations that could occur. These are two end members cases that can occur.
A) The prior model
parameters are much more certain than the observed data yobs, then

xest is near the prior model x0.
B) The prior model
parameters are much less certain than the observed data yobs, then xest
can be farther away from x0.

Using Bayes Theorem, we can obtain the posterior distribution from

For Gaussian statistics for the data and prior model and a linear forward problem y = Ax, then
![]()
The maximum of the posterior distribution for x can be found by maximizing this distribution or minimizing the exponent. Thus
![]()
Assume, for
simplicity, that the parameters and data are real valued. Let
and
, then using index notation
![]()
where repeated indices in individual terms imply summation and both C and D are symmetric where

Then,
![]()
![]()
In vector notation this can be written
![]()
and
![]()
This results in
(1a)
This is called the stochastic inverse based on Bayesian analysis. This can also be written as
(1b)
Ex) Let
. If the data is very
inexact, then
. From Eqn. (1a)
![]()
For this case, the data provides no constraints and the best solution is the prior model.
Ex) Let
. If the prior model
is very inexact, then
, and
![]()
then,
![]()
This is weighted least squares solution. If
, then
![]()
This is the standard least squares solution.
Ex) If
and
, then

If
, and for
, this results in the damped least squares solution,
![]()
Thus, the stochastic inverse “damps” out those eigenvectors that would give rise to perturbations in x larger than the prior model variance. The eigenvectors associated with very small eigenvalues are reasonably taken care of by the use of damping without resorting to the singular value decomposition.
For the stochastic inverse, the
resolution matrix
is a distorted version
of [VpVp*] and also I-R is a
distorted version of {V0V0*}. But, if the damping is not too severe, R is
still a useful approximation to the resolution matrix.
As
in the damped least
squares is changed from 0 to
, then the graph below results

where
![]()
This is called
the “L-curve” by Hansen (1998) (“Rank-Deficient and Discrete Ill-Posed
Problems”,
The resolution and covariance matrices can be written as
![]()
![]()
where the term
results from the
amplification of the data errors back to the model by
and
is projection of the
prior model covariance and errors onto the null space of A.
For the stochastic inverse,
![]()
and
![]()
The resolution matrix can then be interpreted as a filter resulting in the estimated model parameters x.
![]()
Now,
![]()
![]()
Then,
![]()
Note these
forms for
are a formal composite
of both data and remaining prior model errors in the stochastic inverse
formulation. Also, the form
is not valid in the
least squares limit where R = I and the other form should be used
operationally. For the case where
, then
approaches the
weighted least squares limit for the model covariance.
Data statistics can often be determined experimentally. But, model probabilities are purely subjective and one must beware of personal bias. Another hidden pitfall is if the forward modeling matrix A is incorrect. Difficulties will arise if the underlying model structure that was used to determine A is unrealistic.
Ex) A common structural model is a layered Earth and many false conclusions in geophysics have resulted from this assumption.
There are
two alternative forms for the stochastic inverse. First, the following identity is true for
positive definite covariance matrices
and
.
![]()
This can be shown by manipulating to get the identity
![]()
Starting with Eqn. (1b) given earlier, we can get the alternative form
(1c)
This form results in the inversion of an MxM matrix instead of an NxN matrix and is commonly used when there are less data than model parameters since the matrix to be inverted is smaller. The two forms in equations (1b) and (1c) can be used depending on whether the model or data spaces are larger.
The stochastic inverse is the most sensible approach to an inverse problem provided the covariances (or probability distributions) are known or can be estimated. It can be viewed as a damped least squares solution and one can show that a little damping is always useful.
For these discussions, it is useful to renormalize our data and parameters so that they have unit variance, a very useful numerical procedure!
Consider the data covariance matrix
![]()
Define
![]()
Also consider the prior model covariance matrix
![]()
Define
![]()
Now let
![]()
This results in the weighted forward problem
.
The stochastic solution for the linear case given earlier, can be written either as
Form A)
(Eqn. 1b)
or
Form B)
(Eqn. 1c)
The reweighted problem can be written in terms of
![]()
Then,
Form A1) ![]()
Form B2) ![]()
and
. The solution to the
original problem is then found from
![]()
![]()
Nonlinear Problems
For nonlinear problems g(x,y) = 0 and y = g(x), complicated multi-minimum error surfaces can often occur as shown in the figure below.

Most direct methods for solving nonlinear problems require
the sampling of the error surface by many calculations of the forward problem
and evaluating the error
to find the
minimum. Different methods include:
1) The grid search method. In this approach, the model space is
uniformly sampled in order to find the smallest
. Unfortunately, this
method is very slow for model spaces with many parameters.
2) The
.
3) The Simulated Annealing Method. In this approach, an analogy to thermodynamics is used to restrict the area searched in the model space depending on a decreasing “temperature” parameter.
4) Genetic Algorithms. In these approaches, an analogy to genetics in biology is used to select regions of the model space to sample.
These approaches are all ongoing areas of research and I refer to Sen and Stoffa (1995) “Global Optimization Methods in Geophysical Inversion” Elsevier, for more details.
A gradient
style approach can also be used similar to
For the nonlinear case, use a

To first order, this can be truncated to the first two terms. Let
evaluated at xk
be the linear sensitivity matrix or Fréchet
derivative for the nonlinear operator g(x). Using a Bayesian approach, an iterative
algorithm is then obtained as,
![]()
This can also be written in normalized forms (see, Tarantola, 1987). The second term in the brackets is zero for the first iteration, but is non-zero for higher iterations. This term has the function of keeping the solution referenced to the original prior x0 and it’s uncertainty.