Week 2 — Inner Products, Orthogonality, the Spectral Theorem, and SVD

Course 5 syllabus

Overview

This is the most reused week in the course. Inner products give length and angle; orthonormal bases make projection trivial; orthogonal projection is least squares; the spectral theorem says symmetric/self-adjoint operators have orthonormal eigenbases with real eigenvalues; and the SVD extends all of that to arbitrary rectangular matrices. PCA, least squares, covariance, low-rank approximation, and the conditioning theory of Week 3 are all corollaries of what is below.

Readings

  • Axler 6.A — Inner products and norms. Inner product axioms, Cauchy–Schwarz, triangle inequality, the induced norm.
  • Axler 6.B — Orthonormal bases. Gram–Schmidt, existence of orthonormal bases, Riesz representation.
  • Axler 6.C — Orthogonal complements and minimization. Orthogonal decomposition, the projection that solves the least-distance problem.
  • Axler 7.B — Spectral theorem. Real (self-adjoint) and complex (normal) versions.
  • Axler 7.D — Singular value decomposition. Singular values, the SVD of any operator.
  • Axler Ch 9 — Determinants (determinant-oriented material): determinant of an operator, relation to eigenvalues and volume.

Key Concepts

Inner products, norms, Cauchy–Schwarz

An inner product \(\langle\cdot,\cdot\rangle\) is linear in the first slot, conjugate-symmetric, and positive definite. It induces the norm \(\|v\| = \sqrt{\langle v,v\rangle}\) and the Cauchy–Schwarz inequality

\[|\langle u, v\rangle| \le \|u\|\,\|v\|,\]

with equality iff \(u,v\) are linearly dependent. The triangle inequality follows.

Orthonormal bases and Gram–Schmidt

A list \(e_1,\dots,e_n\) is orthonormal if \(\langle e_i, e_j\rangle = \delta_{ij}\). Then any \(v = \sum_i \langle v, e_i\rangle e_i\) and \(\|v\|^2 = \sum_i |\langle v, e_i\rangle|^2\) (Parseval). Gram–Schmidt converts any basis into an orthonormal one by successively subtracting projections:

\[e_k = \frac{v_k - \sum_{j<k}\langle v_k, e_j\rangle e_j}{\big\| v_k - \sum_{j<k}\langle v_k, e_j\rangle e_j \big\|}.\]

This is exactly the QR factorization viewed abstractly (Week 3).

Orthogonal complements and least squares

For a subspace \(U\), \(V = U \oplus U^\perp\), and every \(v\) splits uniquely as \(v = P_U v + P_{U^\perp} v\). The orthogonal projection \(P_U v\) is the unique closest point of \(U\) to \(v\):

\[\|v - P_U v\| \le \|v - u\| \quad \text{for all } u \in U.\]

Applied to \(U = \operatorname{range}(A)\), this is the least-squares solution of \(Ax \approx b\): the residual \(b - Ax\) is orthogonal to the columns of \(A\), giving the normal equations \(A^\top A x = A^\top b\).

Spectral theorem

If \(T\) is self-adjoint (\(T = T^*\)) on a real inner product space — or normal (\(TT^* = T^*T\)) on a complex one — then \(V\) has an orthonormal basis of eigenvectors and the eigenvalues are real (in the self-adjoint case). In matrix terms, a symmetric \(A\) factors as

\[A = Q \Lambda Q^\top, \qquad Q^\top Q = I, \quad \Lambda = \operatorname{diag}(\lambda_1,\dots,\lambda_n).\]

This is the foundation of PCA (eigendecomposition of a covariance matrix) and of second-order optimality (the Hessian, Week 9).

Singular value decomposition

Every \(A \in \mathbb{R}^{m\times n}\) factors as

\[A = U \Sigma V^\top, \qquad U^\top U = I_m,\; V^\top V = I_n,\; \Sigma = \operatorname{diag}(\sigma_1 \ge \cdots \ge \sigma_r > 0).\]

The singular values \(\sigma_i\) are the square roots of the eigenvalues of \(A^\top A\); the columns of \(V\) and \(U\) are the right/left singular vectors. The SVD gives the four fundamental subspaces, the 2-norm \(\|A\|_2 = \sigma_1\), the rank, the pseudoinverse, and the best rank-\(k\) approximation (Eckart–Young): truncating to the top \(k\) singular triples minimizes \(\|A - A_k\|\) in both spectral and Frobenius norm.

Determinants

The determinant is the unique alternating multilinear function of the columns with \(\det I = 1\). Key facts: \(\det(AB) = \det A \det B\), \(\det A = \prod_i \lambda_i\) (product of eigenvalues), \(|\det A|\) is the volume-scaling factor of the map, and \(A\) is invertible iff \(\det A \neq 0\). The determinant connects eigenvalues, orientation, and the change-of-variables Jacobian used in probability (Week 6).

Connections

  • Forward: Week 3 computes QR, projections, and the SVD numerically and analyzes their stability; Week 9 uses the symmetric eigenstructure of the Hessian for Newton’s method; Week 10 uses the Gaussian (whose covariance is symmetric PSD) as the max-entropy distribution.
  • Across courses: SVD/PCA for dimensionality reduction and pose estimation (Courses 1–2), spectral methods and embedding geometry (Course 3).

Further Reading

  • Axler, Linear Algebra Done Right, 4th ed., Chapters 6, 7, and 9.
  • Trefethen & Bau, Numerical Linear Algebra, Lectures 4–5 for the SVD from the numerical side.