EAS 657

Purdue University

EAS 657

Geophysical Inverse Theory

Robert L. Nowack

Lecture 4

Optimization in a Hilbert Space

With an inner product, I.P., defined, we can perform a direct sum decomposition of V, thus

V = V₁ + V₂

Note in a direction sum decomposition, V₁ and V₂ are orthogonal subspaces, but don’t include all vectors in V.

Now for every I.P., we can find an orthonormal bases via Gram Schmidt. The converse is also true. For any basis for V {v₁…v_n} we can find some inner product for which these vectors are orthonormal. (This can be used later to show that optimality depends on the particular inner product and error norm chosen!)

Now, the problem for given any vector v and subspace V₁ in V,

find a vector in V₁ which is the “best approximation” to v, i.e., find a vector v₁ in V₁ which is “closest” to v in some sense. Alternatively, given an error vector e = v – v₁, we want to minimize e.

The projection theorem states that given a vector v in V and a subspace V₁, there exists a unique vector such that | e | = || v – v₁|| is minimized. In addition, the vector e is orthogonal to the subspace V₁.

1) Decompose V into where by a direct sum decomposition of an orthogonal basis. Let, the vector v = v₁ + v₂ where .

An arbitrary error vector can be written, where as

Now,

The error is minimized if we let . Then, where the error vector e is perpendicular to all vectors in V₁.

Many optimization problems in a Hilbert Space are based on this idea!

Application of Projections

Ex) Let V = R² be spanned by and .

Gram Schmidt can be used to find the projection of v₂ onto v₁. First,

The first vector is already normalized. The projection of v₂ onto v₁ is

Now,

This is also normalized, thus .

Ex) Find the projection of onto the subspace defined by the equation a^Tx = 0 (defining an R² subspace perpendicular to the vector a) where and V = R³.

Let the subspace spanned by a be called S^perp and the plane perpendicular to a be called S. Then,

S + S^perp = V = R³

Any vector can then be uniquely projected onto S and onto S^perp. In this case, it is easiest to first project onto . Let

(the normalized version of a)

Now,

then,

and

Thus the vector b^S in S that “best approximates” is its projection onto S.

We can always represent an arbitrary vector x in terms of a basis in V. Thus,

If we have constructed an orthonormal basis, , then we can write

Ex) Let v(t) be a periodic signal space V with period T such that

Now let,

where v₁(t) is also a periodic signal with period T in a subspace V₁ spanned by the first 2M + 1 Fourier coefficients with M < N. Then the best approximation of a signal v(t) in the subspace V₁ is just the truncated series. However, a smoother signal could be obtained by tapering the truncation window, but this would require a modified inner product.

Ex) In R³ with a standard inner product, what is the best approximation to the vector in the x₁ – x₂ plane? It is just the vector .

Ex) What is the best approximation to the vector in the subspace V₁ defined by 8x + 5y + 4z = 0? The projection theorem states that we must decompose v into and , then

v = v₁ + v₂

where

Let a subspace S be spanned by the columns of a matrix A (M x N) in R^M. For the case above, find two independent vectors in V₁ perpendicular to and put as columns of A. Now project a vector b onto the subspace S spanned by the column vectors a₁, a₂, … a_N. Then,

b^P = x₁a₁ + x₂a₂ … x_Na_N or,

b^P = Ax = [a₁, a₂, … a_N]

Now, the error vector e will be perpendicular to all vectors in S. Then for e = b – Ax

A^T(b – Ax) = 0

This gives

A^Tb = A^TA x

and

x = (A^TA)^-1 A^Tb

(Note: for independent columns of A, (A^TA) is invertible, i.e. if the columns of A, a₁, a₂, … a_N form a basis for S.

Then,

b^P = Ax = A(A^TA)^-1A^T b

This is the general case of projecting a vector b onto a subspace S with a basis specified by the columns of a matrix A.

Linear Transformations Between Finite Dimensional Vector Spaces

Let L map a vector to a vector

A linear transformation satisfies

1) L(v₁ + v₂) = L(v₁) + L(v₂)

2) L(av) = aL(v)

The second part can be remembered as: if a system is linear, then “if you double the input, you double the output”.

A linear transformation is “onto” if for any , there is a vector such that L(v) = w

The “range” of a linear transformation is the subspace corresponding to L(v). This is referred to as R(L) (for an “onto” transformation R(L) equals all of W).

A linear transformation is “one-to-one” if for any , there is a unique v such that L(v) = w.

A linear transformation is “invertible” if

1) it is one-to-one (invertible on its range)

2) onto

The “null space” of a linear transformation, N(L) is the subspace of V defined by the vectors v_i, such that L(v_i) = 0.

Dim{N(L)} = 0 if and only if the transformation is one-to-one. To show this, assume L is not one-to-one. Then there exists vectors v₁ and v₂, such that L(v₁) = L(v₂) = w. Then, L(v₁) – L(v₂) = L(v₁ – v₂) = 0. Then, , but N(L) = {0}, so (v₁ – v₂) = 0 and v₁ = v₂.

Eigenvectors and Eigenvalues

These are defined for L:V V. If , then is called an eigenvalue of L and v is the corresponding eigenvector. (More on this will be given with respect to matrices later).

Matrix Transformations

Let A:R^N R^M with inner products

These will be called the usual inner products. Vectors will be represented by columns and A by an M x N matrix a_ij, i = 1,M, j = 1,N. (M is the number of rows (first index), N is the number of columns (second index)). A can be written

R(A) is the vector space spanned by the columns of A, and N^perp(A) is the vector space spanned by rows of A. The “rank” of a matrix A is the number of linearly independent columns of A (or the number of independent rows of A as well).

We want to show that the columns of A span R(A) and the rows of A span N^perp(A). Let

where e_i are the standard basis in R^N specified as

Then,

Let a_i be the i^th column of the matrix A

then,

with Thus, the range of A, R(A), is spanned by the columns of A. Let

then,

If , then Av = 0, and . Then, is perpendicular to all b_i, the rows of the matrix A. Thus, is spanned by rows of A.

Now the rank of A is noted as r and is equal to the dimension of the space spanned by the columns of A, as well as the dimension of the space spanned by the rows of A. Thus, the rank r is

r(A) = Dim {R(A)} = Dim {N^perp(A)}

Ex) Let

with A:R³ R³.

Find the basis for the range R(A).

1) Check Det A. If Det A = 0, then the columns of A are linearly dependent. Since Det A = 0, choose the basis for R(A) as

Thus, the rank of A is equal to 2.

Choose as a basis for N^perp(A) the independent rows of A as

(1 0 1)

(0 1 1)

To find N(A), use fact that V = N(A) + N^perp(A). Thus, for N(A), find all vectors perpendicular to N^perp(A).

Matrices are examples of linear transformations. Also, any linear transformation between finite dimensional vector spaces is “isomorphic” with a matrix.

Use L₁ to go from V to R^N and L₂ to go from W to R^M. Then

L(v) = L₂AL₁(v)

where L₁, L₂ are just basis representations of vectors in V and W. Thus,

then,

and

Thus, A tells us how to map from basis functions in V to basis functions in W