Week 8 — Perception Baselines: Classical + Tiny Learned Detection
Overview
Now that the engine produces labeled sensor data, this week closes the perception loop: run simple detectors on the simulated camera output and evaluate them against the exact ground truth from Week 7. The point is not to build a state-of-the-art detector — it is to demonstrate that the simulator is a working perception test bench, and to build intuition for the classical-vs-learned tradeoff. You implement a classical detector (lane lines via edge detection + Hough/line-fitting; traffic-light state via color thresholding) and optionally a tiny CNN, then measure precision/recall against ground truth.
This is where the synthetic ground truth pays off: every detection can be scored exactly, with no labeling effort, across as many scenarios as you can generate. Course 5’s signal/image foundations and Course 1’s ML background inform the methods; here they operate on data the engine produced.
Readings
- MIT Machine Vision: filtering, edges, line finding, and motion vision skim. Extract: the classical detection pipeline (filter → edges → fit).
- CS231n: CNNs and detection/segmentation skim. Extract: the tiny-CNN detector structure and its training.
- Szeliski: feature detection and detection basics. Extract: classical feature/line detection and evaluation metrics.
Key Concepts
Classical lane and light detection
Lane detection: grayscale → Gaussian blur → gradient/edge (Sobel/Canny) → region of interest → Hough transform or RANSAC line fit → lane lines in image space → back-project to the road (Week 7 depth/intrinsics). Traffic-light state: locate the light region (or use a known projected position), threshold in HSV to classify red/yellow/green. These are deterministic, interpretable, and fast — the right baseline.
A tiny learned detector (optional)
A small CNN trained on the simulator’s labeled images can detect/classify objects (vehicles, lights). With synthetic ground truth you can generate unlimited training data. Keep it tiny and understood; the goal is the classical-vs-learned comparison, not leaderboard performance.
Evaluation against ground truth
Score detections with precision, recall, F1, and IoU against the Week 7 semantic/box ground truth. Because the ground truth is exact, the metrics are trustworthy. Sweep conditions (lighting, occlusion, distance) the scenario system (Week 9) can vary to find where each detector breaks.
The sim-to-real caveat
Synthetic perception is easier than real (no sensor noise, perfect labels, simple textures). Acknowledge this explicitly: the simulator validates logic and integration, not real-world robustness. This honesty is part of staff-level engineering judgment.
Theory Exercises
- Walk through the classical lane-detection pipeline; explain the role of each stage and where it fails.
- Define precision, recall, F1, and IoU; compute them for a small set of detections vs ground-truth boxes.
- Explain the Hough transform for line detection and why it is robust to gaps/noise.
- Describe how to back-project an image-space lane detection to road coordinates using depth and intrinsics (Week 7).
- Discuss why synthetic perception over-states real-world accuracy and what that means for using the simulator.
Implementation
Implement a classical lane detector (edges + Hough/RANSAC, back-projected via Week 7 depth) and a traffic-light state classifier (HSV thresholding). Optionally train a tiny CNN on generated labeled data. Build an evaluation harness that scores detections against Week 7 ground truth (precision/recall/F1/IoU). Visualize detections overlaid on the camera image.
Benchmark
Detection latency (classical vs CNN), precision/recall/F1/IoU against ground truth, and a robustness sweep across lighting/occlusion/distance. CNN training time on Mac (MPS) vs Jetson (CUDA), connecting to Course 1’s ML systems work.
Expected baselines: the classical detector is fast and accurate on clean synthetic data, degrading predictably with occlusion/distance; the tiny CNN trades latency for robustness on harder cases. The exact ground truth makes every number trustworthy.
Connections
This proves the engine is a usable perception test bench — the core portfolio claim. It consumes Week 7’s sensor data and is exercised by Week 9’s scenarios. It connects to Course 1 (the ML detector is a workload to train/profile) and Course 5/6 (image/signal foundations). Week 10 packages this into the integrated demo.
Further Reading
- Szeliski, Computer Vision, chapters on feature/edge detection.
- MIT Machine Vision lectures on filtering and line finding.
- CS231n detection material (for the optional CNN).