Week 9 — Convex Problems, Duality, and Optimization Algorithms

Course 5 syllabus

Overview

With convexity in hand, this week assembles the optimization toolkit. First, the standard problem forms (LP, QP, and the general convex program) and why putting a problem in one of them matters. Then Lagrangian duality and the KKT conditions — the certificate of optimality that turns “I found a point” into “I proved it is optimal.” Then the modeling layers of approximation/fitting and statistical estimation, which is where most ML objectives come from. Finally the core algorithms: gradient descent, Newton’s method, and their equality-constrained versions.

Readings

  • Boyd Ch 4 — Convex optimization problems. Standard form, LP, QP, geometric programming, generalized-inequality constraints.
  • Boyd Ch 5 — Duality. Lagrangian and dual function, weak/strong duality, geometric interpretation, KKT optimality conditions, sensitivity.
  • Boyd Ch 6 — Approximation and fitting. Norm/least-norm problems, regularization, robust approximation.
  • Boyd Ch 7 — Statistical estimation. Maximum-likelihood and MAP as convex problems, optimal detector design, experiment design.
  • Boyd Ch 9–10 — Unconstrained and equality-constrained minimization. Gradient descent, Newton’s method, self-concordance, equality-constrained Newton.

Key Concepts

Standard problem forms

A convex program minimizes a convex \(f_0\) subject to convex inequality constraints \(f_i(x)\le 0\) and affine equality constraints \(Ax=b\). Special cases: linear programs (linear objective and constraints), quadratic programs (convex quadratic objective), second-order cone and semidefinite programs. Recognizing which form your problem fits determines which solver and which theory apply — and least squares (Week 2) is the simplest QP.

Lagrangian duality

Form the Lagrangian \(L(x,\lambda,\nu) = f_0(x) + \sum_i \lambda_i f_i(x) + \nu^\top(Ax-b)\) and the dual function \(g(\lambda,\nu) = \inf_x L\). The dual is always concave and gives a lower bound (weak duality): \(g(\lambda,\nu) \le p^\star\). Under a constraint qualification (e.g. Slater’s condition) strong duality holds, \(d^\star = p^\star\), and the duality gap closes. The dual variables are exactly the sensitivities — the shadow prices of the constraints.

KKT conditions

For a convex problem with strong duality, \(x^\star,\lambda^\star,\nu^\star\) are optimal iff they satisfy the KKT conditions: primal feasibility, dual feasibility \(\lambda^\star \ge 0\), complementary slackness \(\lambda_i^\star f_i(x^\star)=0\), and stationarity

\[\nabla f_0(x^\star) + \sum_i \lambda_i^\star \nabla f_i(x^\star) + A^\top \nu^\star = 0.\]

These generalize “set the gradient to zero” to constrained problems and are the optimality certificate used everywhere downstream.

Approximation, fitting, estimation

Norm approximation \(\min \|Ax-b\|\), regularized forms \(\min \|Ax-b\|_2^2 + \lambda\|x\|\) (ridge/Tikhonov for \(\ell_2\), lasso/sparsity for \(\ell_1\)), and robust formulations all sit here. Maximum likelihood and MAP estimation become convex problems for log-concave models — so the inference of Week 7 is solved by the algorithms below, and ML regularization is a Bayesian prior in disguise.

Algorithms

For unconstrained smooth convex \(f\):

  • Gradient descent \(x_{k+1} = x_k - t_k \nabla f(x_k)\). Convergence rate depends on the condition number \(\kappa\) of the Hessian (Week 3) — ill-conditioning means zig-zagging and slow progress.
  • Newton’s method \(x_{k+1} = x_k - t_k\,[\nabla^2 f(x_k)]^{-1}\nabla f(x_k)\) — affine-invariant, quadratically convergent near the optimum, analyzed cleanly via self-concordance. The Newton step solves a linear system (Weeks 3–4).

For equality-constrained problems, the KKT system is solved at each step (equality-constrained Newton, infeasible-start Newton); inequality constraints are handled by interior-point/barrier methods (read lightly). Step sizes come from backtracking line search.

Connections

  • Forward: The maximum-entropy problems of Week 10 are convex programs solved by exactly this duality machinery.
  • Across courses: This is the engine under training (Courses 1, 3) — losses, regularizers, and the optimizers that minimize them — and under any model-predictive or trajectory optimization (Courses 1, 2).

Further Reading

  • Boyd & Vandenberghe, Convex Optimization, Chapters 4–7, 9–10.
  • Nocedal & Wright, Numerical Optimization, for the algorithmic details and line-search theory.