Week 2 — AV Scene Representation, Coordinate Frames, and Trajectory Geometry

Course 1 syllabus

Overview

An autonomy stack is, at its core, a machine that reasons about geometry over time. Before any prediction or planning can happen, you need a clean representation of the scene — where the ego vehicle is, where other agents are, what frame each is expressed in — and a precise language for motion: position, velocity, acceleration, jerk, curvature, and arc length. This week builds that representation and that language. Coordinate-frame discipline is the single largest source of bugs in real robotics code, so we pin down conventions now and never deviate.

Course 5 already gave you linear maps, change of basis, and the differential geometry of curves; this week is the applied layer on top. You will represent a synthetic AV scene with an ego frame and agent frames, transform between them, and express trajectories as time-parameterized curves whose geometric properties (curvature, smoothness) become the feasibility and comfort metrics that Week 6’s planner consumes.

Readings

  • Pressley (review): curves, arc length, tangent vectors, curvature. Extract: curvature \(\kappa\) as the rate of turning of the unit tangent, and arc-length parameterization.
  • Morin (review): position, velocity, acceleration, relative motion. Extract: kinematics in a moving (ego) frame.
  • MIT Machine Vision (skim): image formation and perspective projection. Extract: how a world point maps to a camera, for later vision context.
  • (Linear maps, change of basis, and homogeneous transforms: assumed from Course 5.)

Key Concepts

Frames and transforms

Adopt a fixed convention: a right-handed world frame and a body frame with \(+x\) forward, \(+y\) left, \(+z\) up (ISO 8855). A pose is a rigid transform \(T_{A\to B}=\begin{bmatrix}R & t\\ 0 & 1\end{bmatrix}\) with \(R\in SO(3)\). Expressing an agent’s state in the ego frame is left-multiplication by \(T_{world\to ego}=T_{ego\to world}^{-1}\). Read subscripts right-to-left and composition order is automatic. Mismatched frames are the integration bug; a single documented convention prevents most of them.

Trajectories as time-indexed geometry

A trajectory is a curve \(\gamma(t)=(x(t),y(t))\). Velocity \(\dot\gamma\), acceleration \(\ddot\gamma\), and jerk \(\dddot\gamma\) are its derivatives; arc length \(s(t)=\int_0^t\|\dot\gamma\|\,d\tau\) reparameterizes by distance traveled. Curvature

\[ \kappa = \frac{\dot x\ddot y - \dot y\ddot x}{(\dot x^2+\dot y^2)^{3/2}} \]

measures how sharply the path turns; its reciprocal is the turning radius. These quantities are not academic: lateral acceleration \(\approx v^2\kappa\) is a comfort/feasibility limit, and jerk bounds ride smoothness.

Scene representation

Represent the scene as: an ego state (pose + velocity), a set of agent states in a shared frame, and a lightweight map (lane centerlines as polylines/splines). Keep everything in one world frame for storage and transform into the ego frame on demand. This is the data model prediction (Week 4) and planning (Week 6) will read.

Theory Exercises

  1. Given \(T_{ego\to world}\) and an agent pose in world, derive the agent pose in the ego frame; verify with a numeric example.
  2. Derive the planar curvature formula from \(\gamma(t)\) and show \(a_\text{lat}=v^2\kappa\).
  3. For a constant-curvature arc (clothoid segment approximation), compute arc length, heading change, and lateral acceleration at speed \(v\).
  4. Show that arc-length reparameterization makes \(\|\gamma'(s)\|=1\) and why that simplifies curvature.
  5. Compute jerk for a quintic-polynomial lane-change trajectory and identify where it peaks.

Implementation

Build common/geometry (transforms, pose composition/inverse) and common/trajectory (sampled trajectory with finite-difference velocity/accel/jerk and analytic curvature) in both C++ and Python. Construct a small synthetic scene: ego + a few agents + a couple of lane centerlines. Provide a function to express the whole scene in the ego frame and a matplotlib visualization.

Benchmark

Measure transform/compose throughput and trajectory-metric computation cost (curvature/jerk over an N-point trajectory) in C++ vs Python. This sets the cost baseline for the prediction/planning loop. Verify numerical curvature against the analytic value for a known arc.

Expected baselines: transform composition is nanoseconds in C++; the Python path is dominated by interpreter overhead — note the ratio, since the runtime loop (Week 8) will need the C++ path.

Connections

This representation and these metrics are consumed directly by Week 4 (predicting agent trajectories), Week 6 (curvature/jerk as planner feasibility and cost terms), and the Week 10 runtime (ego state in a control loop). The frame discipline established here is reused for sensor and IMU data in Weeks 8–9. Course 5’s differential geometry and linear algebra are the theory; this is the engineering instantiation.