Purdue University

EAS 657

Geophysical Inverse Theory

Robert L. Nowack

Lecture 13

Practical considerations

I will cover briefly the following practical issues:

1) Log variables

2) Inequality/equality constraints

3) L₁ norm

1) Log variables

In many cases, using ln (parameters) – ln (data) improves the inversion procedure.

Ex) For many physical problems the parameters can only have positive values and we must discriminate against negative parameter values.

Ex) There is generally a problem of units for different parameters and using log variables can address this.

Ex) In some cases, the data is complex (real and imaginary) and one wants to weight them appropriately.

As we saw earlier, methods that help to compress the range of singular values and eigenvalues will give better convergence. These issues can be dealt with by using a ln or log parameterization in both the data and the model parameters.

In our original linearized inversion scheme,

(1)

The solution to (1) can be used to update the parameters as

With log parameters, we set up the linearized system as

or,

(no sum on i)

Then,

a) Note that this procedure never allows x_j to change sign.

b) Changing the units describing either the data or the parameters does not cause any changes in the equations.

c) In addition, complex data are dealt with using the definition of the log of complex numbers as,

Then,

and

Ex) Since high frequency seismic waves are of the form

where is the travel time, then travel time inversions are log variable parameterizations using only the imaginary part particular arrivals in the data.

where the model parameters can be either velocity or slowness variables.

d) One other advantage of log parameterizations is the ability to deal with large parameter changes as one iterates using a linearized inversion scheme. Only one iteration may be needed if our parameters are off by an order of magnitude.

Ex) Consider the problem of finding x given tan (x). Of course, tan^-1x is a standard function, but we will solve this by a linearized iteration, where

Now for a linear parameterization, , then

or,

In log variables,

Then,

Let y = .01 (tan x = 0.01 where x^true = 0.01 radians). Let’s start with x₀ = 1.5 rad (a poor initial guess).

Now use both iterative procedures.

	Linear Variables			Log variables
	x	y/tan x		x	y/tan x
0	1.5	7.09 x 10^-4	0	1.5	7.09 x 10^-4
1	1.4295	1.42 x 10^-3	1	1.0665	5.52 x 10^-3
2	1.2903	2.88 x 10^-3	2	.1356	7.33 x 10^-2
3	1.0250	6.07 x 10^-2	3	.0103	9.75 x 10^-1
4	.5840	1.51 x 10^-2	4	.01	1.0
5	.1310	7.59 x 10^-2
6	.0113	8.83 x 10^-1
7	.01	1.0

Thus, the log scheme makes about an order of magnitude improvement in the fit at each step while the linear scheme only improves the fit by about a factor of 2 at each step until near the end.

2a) Linear equality constraints (LEP)

Consider solving Ax = y in the least squares sense subject to s equality constraints Gx = h. These are hard constraints that we want to be satisfied exactly.

Ex) Assume a constraint , or

Ex) Assume a constrant , or

a) We could combine Ax = y with these equations, but this might lead to some roundoff errors, and we don’t know how large to make the relative damping. This also does not reduce the number of unknowns.

b) We could solve this by increasing the number of unknowns through “Lagrange multipliers”.

We want to minimize with the constraints . This can be written as

subject to . Now, there is no solution if . Thus, we assume that h lies in the range of G so that a solution to the constraint equations exists.

If P rows of G are linearly independent, then the constraints remove P degrees of freedom in the choice of x. Consider two feasible solutions x¹, x² that solve , then

and

where N(G) is the nullspace of G.

Let and where is any feasible direction with respect to the linear equality constraints. Then, .

In order to examine the optimality of F(x) about some x_A along a feasible direction , we examine the first order Taylor expansion

For this to be a possible minimum, . This is a constrained stationary point, and from this must be perpendicular to any feasible direction of the constraints. Thus, must be in the row space of G or for some vector which are called the Lagrange multipliers. These are unique only if the rows of G are linearly independent. Thus, in order to construct an equivalent unconstrained problem, we must solve

and

for,

Then,

and

This can be written in matrix form as

where . This is an augmented system which can then be solved for by any of the unconstratined ways that we used previously.

c) As an alternative approach, we can solve a reduced set of equations by using the SVD decomposition of G. Thus, using SVD,

and a general solution to the constraints is

where is an arbitrary N-P length vector assuming the number of nonzero singular values of G is P. Now substitute this into

or,

which is now an unconstrained problem for the reduced vector .

Thus, we have three ways of solving constrained equality constraints problems

a) augmented equations (roundoff errors) – unknown vector has a length N

b) Lagrange multipliers – unknown vector has a length N+P

c) reduced set of equations (assumes we have explicit constraint equations) – unknown vector has a length N-P

2b) Linear inequality constraints (LIP)

Examples of linear inequality constraints include

Ex) nonnegativity constraints

Ex) range constraints on variables

1) Certainly the easiest way to introduce range constraints is to solve

which forces x to be “near” x₀ for a small damping .

2) Minimize

subject to . Note constraints are avoided by using -1 as needed.

A lot of work has been done on this (mostly in books on optimization (see “Solving Least Squares Problems” by C. Lawson and R. Hanson (1995)). Each constraint defines a hyperplane that divides the space R^N of solutions x in two, one in which the constraint is satisfied (the feasible half-space) and one in which the constraint is not satisfied (the infeasible half-space). The inequality constraints therefore define a volume in R^N which might have zero, finite, or infinite extent.

Ex) A zero extent feasible space is shown below.

Ex) A finite extent feasible space is shown below.

Ex) An infinite extent feasible space is shown below.

The volume has the shape of a polygon since for linear inequality constraints each equation is a plane. This polygon is convex like a football with no indentations.

We need to distinguish between constraints that are active and inactive. An inactive constraint does not restrict the perturbation of the solution. An active constraint does. Let be the set of active constraints at some point x. Two such solutions can be written as

where x₁ – x₂ = p, and x₁ is a boundary perturbation. Any perturbation into the feasible space then gives .

Consider a Taylor series about x along a boundary perturbation

For there to be a minimum, then

or must be in the column space of G. Then,

(similar to the equality constraints case)

(Note, inactive constraints have zero Lagrange multipliers). However, since non-boundary perturbations are also feasible directions with respect to the inequality constraints, the point x will not be optimal if there exists any non-boundary perturbation that is a descent direction for F.

To take this condition into account, we must have all . Thus, we have

for the boundary constraints

which is a variation of the Tucker-Kuhn conditions.

Thus, a crucial distinction for the LIP case is the restriction on the sign of . A negative would imply an infeasible direction of descent. This is much more complicated than the LEP problem since we don’t know which, if any, of the inequality constraints hold. This must be solved iteratively. Inequality constraints are incorporated as equality constraints as necessary for solutions on the boundary.

For an interior solution,

For a solution on the boundary,

subject to . Lawson and Hanson (1995) give an algorithm for least squares with inequality constraints as well as nonnegative least squares or NNLS, (this is also available in MatLab).

3) Non-Gaussian statistics (the L₁ norm)

In some cases there will be a few wildly scattered errors in the data. In general, these will cause least squares estimates to go haywire.

Ex) Take the average of M numbers

where (2)

For the least squares solution, we assume that n is a set of Gaussian random variables with zero mean. A Gaussian PDf is shown by the solid line in the figure below.

By minimizing , we obtain . Thus, the maximum likelihood solution is just the “average value”. Let’s assume that M = 5

where 10⁶ is some reading error. Then,

Thus, the least squares solution is hevily influenced by the single bad measurement. One way to avoid this is to take the “sample median”. The sample median is obtained by sorting y_i from smallest to biggest and taking the middle value. In our case, y_median = 1.0. Medians are much more robust to bad data points.

From a maximum likelihood point of view, use in equation (1) a modified PDf distribution for n. For example, let

This is an exponential distribution of the errors and has a much larger “tail” than the Gaussian distribution of errors has as shown by the dotted line in the figure above. This allows for the possibility of much larger excursions the errors. Maximizing this PDf is equivalent to minimizing the exponent. For Ax = y, we want to minimize,

Recall this is just the L₁ norm for the length measure of the residuals. Hence, these minimizations are called L₁ norm problems.

Ex) For the case above

where .

The solution to this is to sort y_i from smallest to largest and then take the middle value. This is illustrated in the figure above and is the same as the sample median. The solution x results when we have as many points y_i above as below in value, i.e. the middle value.

The equations for fitting a line are

The L₁ norm solution to fitting a line finds two equations of the above M equations and solves these exactly to give the smallest absolute error for all the equations. This is illustrated in the figure below.

This can be done computationally by a modification of a well known algorithm called the simplex method originally used to solve linear inequality problems maximizing a linear objective function, linear programming.

The claim often made is that the L₁ norm does what a human brain might do, it disregards points that are outliers.

For more details on robust data fitting using the L₁ norm, see Claerbout and Muir (1973), (Robust modeling with erratic data, Geophysics, 826-844). A short, but effective, algorithm for the Ax = y problem solved using an L₁ norm is given by Barrowdale and Roberts (1977), (Solution of an overdetermined system of equations in the L₁ norm, Communications of the ACM, 17, 319-320).