# Berkeley || Math 54 || Chapter 6 || Orthogonality and Least Squares

A grouping of mathematical concepts concerning the Inner Product function for two vectors, the length of vectors in a space, and perpendicularity of vectors in a space.
Inner Product, Length, and Orthogonality
Notation: u ∙ v u^T v

Function: See image.

Arguments: Two n x 1 column vectors.

Inner Product/Dot Product
We will write a custom essay sample on
Berkeley || Math 54 || Chapter 6 || Orthogonality and Least Squares
or any similar topic only for you
Notation: ||v||

Function: See image.

Arguments: A column vector.

Vector Length (Norm)
Normalizing a nonzero vector v is to divide it by its length, giving the resulting vector u a length of 1 in the direction of v.
Normalizing
The distance between u and v is written as dist(u, v), and is the length of the vector u – v.
Distance Function dist(u, v)
Two vectors u and v are orthogonal if their dot product is zero and/or the sum of the squares of their lengths equals the square of the length of u + v.
Orthogonal Vectors
Given a vector z, its orthogonal complement is a subspace in which all of the composing vectors are orthogonal to z.
Orthogonal Complements
1 – The null space of A is orthogonal to the row space of A.
2 – The column space of A is orthogonal to the null space of the transpose of A.
Orthogonal Spaces
How the orthogonal complement of a subspace is notated.
W-Perp
Formula: See image.
Vector Angle Formula
A set of vectors in R^n are said to be orthogonal if each pair of distinct, nonzero vectors from the set are orthogonal.
Orthogonal Sets
Theorem:
– If S is an orthogonal set, then S is linearly independent.
– The set is a basis for the subspace spanned by S.

Justification:
– Because no vector in the spanning set is nonzero, then all of the weights of the linear combination of the vectors in the orthogonal set have to be zero for the linear combination to be zero. (see image)

Example:
– Think of a three-dimensional coordinate system. Each of the axes are perpendicular to each other, and hence, form a basis for R^3.

Linear Independence of an Orthogonal Set
An orthogonal basis for a subspace W of R^n is a basis for W that is also an orthogonal set.
Orthogonal Basis
If S is an orthogonal basis for a subspace in R^n, then for each y in W, then each specific weight in the linear combination of the vectors in S summing to y is given by the dot product of y and corresponding vector divided by the dot product of the vector corresponding to the weight.
Weights of a Linear Combination in an Orthogonal Basis
Given a nonzero vector u in R^n, consider the problem if decomposing a vector y in R^n into the sum of two vectors, one a multiple of u (y-Hat) and the other orthogonal to u (z).

Example:
– Think of decomposing a velocity vector into its x and y components. y-Hat is the x-component and z is the y-component. They are orthogonal to each other.

Orthogonal Projection (definition)
Given u, this a scalar (the dot product of y and u divided by the dot product of u and u) multiplication by u.

What this does is shorten the u vector to exist in the “shadow” of y, hence the term “projection”.

y-Hat (Orthogonal Projection of y onto u)
The subtraction of the projection of y onto u (“shadow of y”) from y itself.
z (Component of y orthogonal to u)
A set of orthogonal vectors that are all unit length (one).
Orthonormal Sets
An orthogonal basis composed of an orthonormal set.

Example:
– The standard basis for n vectors which analogous to the basis for a Cartesian coordinate system in R^n.

Orthonormal Basis
Only iff (see image).
Checking for Orthonormal Columns
This consists of normalizing each vector in the orthogonal set.
Converting an Orthogonal Basis into an Orthonormal Basis
An orthogonal matrix is a square invertible matrix U such that the inverse of U is the transpose of U.
Orthogonal Matrix
Projecting a vector y onto an orthogonal subspace W, where if dim(W) = 1, y is projected onto a line, and if dim(W) > 1, y is projected onto a subspace.
Orthogonal Projections
Whenever y is written as a linear combination of a set S of n vectors in R^n, the terms in the sum can be grouped into two parts so that y can be written as a sum of two grouped vectors. This is particularly useful with a orthogonal set because if W is the subspace of R^n spanned by S, then the above decomposition can be done without having an orthogonal basis for R^n and just an orthogonal basis for W.
Decomposing Linear Combinations into Two Linear Combinations of Orthogonal Subspaces
If W is a subspace of R^n then each y in R^n can be written uniquely in the form vector decomposition form where y-Hat is in W and z is in W-perp.

In fact, if S is any orthogonal basis of W, then y-Hat can be written as the linear combination of every vector in S projected onto W.

The Orthogonal Decomposition Theorem for Any Orthogonal Basis
For dim(W) > 1, an orthogonal projection is written as (see image in y-Hat). This can then be decomposed by the Orthogonal Decomposition Theorem (see bottom image).
Orthogonal Projection of y onto L (dim(W) > 1)
For dim(W) = 1 an orthogonal projection is written as (see image) with L representing a one-dimensional subspace (a line). This is because the Orthogonal Decomposition Theorem contains only one term.
Orthogonal Projection of y onto W (dim(W) = 1)
(Sam’s Breakdown)

Notice that in R^3, y forms a skewed square pyramid with Xw, X1w1, and X2w2 as the vertices, and the vectors y, (X1w1 + X2w2), r, X1w1, and X2w2 as the edges.

Given only y, one must first decompose the projection onto W (Xw), so this portion deals with discerning the edges of the skewed square pyramid.

Next, one must decompose the projection of y onto W into the the projection of Xw onto X1 and X2, which gives the vertices of the square pyramid.

Now y can be written as a sum of the orthogonal component r and the projections of the projection of y onto W onto X1 and x2, which are essential the bottom points of the graph.

If that doesn’t make sense, just gather your own understanding from the image.

Geometric Interpretation of the Orthogonal Projection
If y is in the subspace W onto which we would project y, then the projection of y onto W is just y.
Properties of Orthogonal Projections
Let W be a subspace of R^n, y be a vector in R^n, and y-Hat be the orthogonal projection of y onto W. Then y-Hat is the closest point in W to y in the sense that the component of y orthogonal to W (z) is shorter than the distance between y and any other vector v in W.

This makes sense because in the right triangle formed by y, y-Hat, and z, z is the shortest leg and y is the hypotenuse. If the v is any vector in W, dist(y, v) will represent the hypotenuse in another right triangle, where the legs are y – v, v, and z again. No matter where v is in W, the shortest leg will always be z. If v = y-Hat, then the length is equivalent to z because in that case, y – v is z.

The Best Approximation Theorem
If S is an orthonormal basis for a subspace W in R^n, then the projection of y onto W is equal to a matrix U of S multiplied by the transpose of U multiplied by y.

This makes sense because in the Orthogonal Decomposition Theorem, if the length of each vector in S is 1, then the projection of y onto W becomes a sum of the dot product of y all the distinct vectors in the set multiplied by the the distinct vector. This reduces to the y being multiplied by the scalar quantity of the orthogonal matrix U multiplied by its transpose.

Projecting y onto an Orthogonal Subspace Simplified by an Orthonormal Basis
A simple algorithm for producing an orthogonal or orthonormal basis for any nonzero subspace of R^n. Given a basis S for a nonzero subspace W in R^n, the algorithm of the Gram-Schmidt Process produces an orthogonal basis V for W.

The final line is important because states that any subset of V will span the same subspace as a subset of S as long as both subsets include vectors up to k.

The Gram-Schmidt Process
Simply normalize the vectors in the orthogonal basis produced by the Gram-Schmidt Process.
Constructing an Orthonormal Basis from an Orthogonal Basis
Applying the Gram-Schmidt Process to an m x n matrix A that has linearly independent columns amounts to factoring A. This process is called QR Factorization. The aforementioned matrix A can be written as A = QR, where Q is an m x n matrix whose columns form an orthonormal basis for Col(A) and R is an n x n upper triangular invertible matrix with positive entries along the diagonal.

Process:
– Find the Col(A)
– Apply the Gram-Schmidt Process to the Col(A) and normalize to get an orthonormal basis for the Col(A).
– Concatenate the vectors in the orthonormal set into a matrix Q.
– Because the columns of Q are orthonormal the transpose of Q multiplied by Q produces the identity matrix.
– Therefore, Q^TA = Q^T (QR) = IR = R
– Solve Q^TA = R for R.
– The multiplication of Q and R is the QR factorization of A.

QR Factoring
Consider an inconsistent matrix equation (one with no equation). The best one can do in this case is find an x that makes Ax as close as possible to b, minimizing the error. A least-squares problem is a problem that involves finding the solution x-Hat of an inconsistent matrix equation Ax = b that minimizes the distance between b and Ax.

Examples:
– Given a set of points, find the best fit line for the trend of the points, or the line that minimizes the average distance from the points in the given set.

Least-Squares Problems
The solution to the least-squares problem involves:
– Finding the orthogonal projection of b onto Col(A) or b-Hat.
– Creating a consistent matrix equation A(x-Hat) = b-Hat because b-Hat is in the Col(A).
– Finding the component of b orthogonal to the Col(A) or b – (b-Hat).
– Knowing that the component of b orthogonal to the Col(A) is orthogonal to every vector in A, set b – (b-Hat) = b – A(x-Hat), and A^T (b – A(x-Hat)) = 0.
– Simplify to show that x-Hat coincides with the nonempty set of solutions of normal equations, A^T(b) = A^T(A(x-Hat)).
A(x-Hat) = b-Hat (General Least-Squares Problem)
The solution to the normal equations that implies the best approximation or the least-squares solution to an inconsistent matrix equation Ax = b.

This implies that there is one unique least-squares solution for each b in R^m, that the columns are linearly independent, and that A^T(A) is invertible.

x-Hat
The projection of b onto the column space of b.
b-Hat
A system of equations whose nonempty set of solutions coincide with the set of least-squares solutions to Ax = b.
Normal Equations for Ax = b
The distance between b to A(x-Hat).
Least-Squares Error
Given an m x n matrix A with linearly independent columns, let A = QR be a QR factorization of A. Then, for each b in R^m, the matrix equation Ax = b has a unique least-square solution x-Hat.
QR Factorization Least-Squares Solution
Computationally, it is easier to solve this equation (see image) than to compute the inverse of R.
Easier Way to use QR Factorization to Find a Least-Squares Soultion
The inner product can be expanded to other spaces than just Euclidean and vector spaces.

A vector space with an inner product is an inner product space.

Inner Product Spaces
Axioms of the inner product:
– Order does not matter.
– Inner products are expandable.
– Scalars can be pulled from inner products.
– An inner product of consisting of the same vector is only zero if the vector is zero.
Axioms of the Inner Product
The length or norm of a vector is the root of the inner product. This does not have to be the sum of squares because v does not need to be an element of Euclidean space R^n.
Norm/Length (Inner Product Space)
The distance between two vectors is the length of the resultant vector, created by subtracting one from the other.
Distance Between Two Vectors (Inner Product Space)
Vectors are orthogonal if their inner product is zero.
Orthogonality (Inner Product Space)
Can determine orthogonal bases for finite-dimensional subspaces of an inner product space, just as in Euclidean Space (R^n).
The Gram-Schmidt Process (Inner Product Space)
The problem is to best approximate a function f in V by a function g from a specified subspace W of V. The “closeness” of an approximation of f depends on the way the distance between f and g is defined. We will consider only the case in which the distance between f and g are determined by an inner product. The best approximation to f is the orthogonal projection of f onto subspace W. Given the basis of W, or the g vectors over the j index (see image).
Best Approximation in Inner Product Spaces
Because the length of the projection of v onto the subspace W spanned by u cannot exceed the length of v, for all u, v in V the inner product of u and v will always be less than or equal to the product of the lengths of u and v.
Cauchy-Schwartz Inequality
The implication that the length of u + v will not exceed the sum of the lengths of u and v. This is proven with the Cauchy-Schwarz Inequality.
Triangle Inequality
The space of all continuous functions on the interval a ≤ t ≤ b.
C[a, b]
Its inner product is defined by the integral of f multiplied by g over the interval a ≤ t ≤ b.
An Inner Product for C[a, b]

#### New Essays

×

Hi there, would you like to get such a paper? How about receiving a customized one? Check it out