Week 4 — Probabilistic, Multimodal Trajectory Prediction

Overview

The future is not a single number. An agent approaching an intersection might go straight, turn left, or turn right, and a prediction system that outputs one averaged trajectory is worse than useless — it points at the empty space between the modes. This week is about representing and scoring multimodal, probabilistic predictions: a distribution over possible futures, not a point estimate. Course 5 gave you probability, expectation, and information theory; here you apply them to the specific structure of motion forecasting.

You will build the prediction representation that Week 5 will learn and Week 6’s planner will consume: a set of candidate trajectories, each with a probability, plus a way to score predictions against ground truth that respects multimodality. Getting the metrics right matters as much as the model — a metric that rewards mode-averaging will train a model that does exactly the wrong thing.

Readings

CS231n: linear classifiers and softmax. Extract: softmax as a probability over discrete modes and the cross-entropy loss.
PR (Probabilistic Robotics): motion models and belief representations. Extract: how uncertainty in motion is represented and propagated.
Embedded AI: model-drift skim. Extract: why a deployed predictor’s input distribution shifts over time.
(Probability, expectation, variance, entropy, and KL divergence: assumed from Course 5.)

Key Concepts

Why multimodal

Conditioned on the past, the future distribution \(p(\text{future}\mid\text{past})\) is genuinely multimodal — the modes correspond to discrete intentions (turn vs straight). The mean of a bimodal distribution lies between the modes, which is physically the worst place. So we represent the prediction as a small set of modes \(\{(\tau_k, \pi_k)\}\): trajectory \(\tau_k\) with probability \(\pi_k\), \(\sum_k\pi_k=1\) via softmax.

Scoring multimodal predictions

Standard metrics, all over \(K\) modes:

minADE / minFDE (minimum average/final displacement error): error of the best of the \(K\) predicted modes against ground truth — rewards covering the right future without penalizing the others.
Miss rate: fraction of cases where no mode is within a threshold of ground truth.
NLL / cross-entropy: penalizes the probability mass assigned to the realized mode — calibrates the \(\pi_k\).

Using mean (not min) ADE would reward mode-averaging; this is the central metric-design lesson.

Calibration and uncertainty

A predictor’s probabilities should be calibrated: events it calls 70% likely should happen ~70% of the time. KL divergence and reliability diagrams (Course 5 tools) quantify this. Aleatoric uncertainty (inherent randomness) differs from epistemic (model ignorance); the planner treats them differently.

Theory Exercises

Show that for a bimodal target, the single trajectory minimizing expected squared error is the (bad) mean; contrast with the min-over-modes objective.
Derive softmax mode probabilities and the cross-entropy/NLL loss for the realized mode.
Define minADE, minFDE, and miss rate formally for \(K\) modes; show how each behaves under mode-averaging.
Explain calibration; describe how a reliability diagram is built and what miscalibration looks like.
Distinguish aleatoric vs epistemic uncertainty with an AV example of each.

Implementation

Build a prediction data structure (K modes + probabilities) and a metrics module (minADE, minFDE, miss rate, NLL) in Python, operating on the Week 2 trajectory representation. Create a few hand-built scenarios with known multimodal ground truth (e.g. an agent that could turn or go straight) and a constant-velocity baseline predictor to score against.

Benchmark

On the scenario set, compare a unimodal constant-velocity baseline against a hand-tuned multimodal baseline using all metrics. Show concretely that the unimodal predictor’s mean-ADE looks fine but its minADE/miss-rate exposes the failure at branch points.

Expected baselines: constant-velocity does well on straight segments and fails at decision points; the multimodal baseline trades a little straight-line accuracy for far better miss-rate at branches — the quantitative case for multimodality.

Connections

This representation and metric suite is what Week 5’s learned predictor optimizes and reports, and what Week 6’s planner consumes (it must plan against multiple weighted futures). Calibration ties back to Course 5’s information theory. The drift discussion returns in the Week 10 deployment, where input distribution shift is a real failure mode.