Week 1 — Engine Skeleton, Windowing, and the Deterministic Main Loop

Overview

This week sets the structural foundation for everything that follows: a deterministic simulation loop, a windowing layer, and the engine scaffolding that every subsequent system (renderer, physics, sensors, planner) will lean on. By the end you should have a C++20 application that opens a window, reads input, ticks a fixed-timestep simulation, and renders at a variable rate — even if the renderer is currently a clear-color stub. The goal is not visual output yet; it is correctness, modularity, and a main loop that other engineers (and your future self in Week 10) can trust.

A self-driving simulator has three properties that make engine architecture delicate: physics must be deterministic and reproducible so scenario regression tests are meaningful; sensors will be queried at fixed cadences that may differ from the render rate (a 30 Hz camera while rendering at 120 Hz); and the loop will eventually run headless from scenario scripts, with the same code path producing identical results. Decoupling simulation time from wall-clock render time is therefore the first design decision, not later polish. The coordinate and transform math this rests on is assumed from Course 5 — here we fix conventions and build the loop.

Readings

FCG: graphics pipeline overview. Extract: the conceptual stages (model → world → view → clip → NDC → screen) so the loop knows what the renderer will eventually do.
MGPV: Vulkan instance/device/swapchain overview (preview only — don’t implement yet). Extract: the taxonomy of objects Week 2 will instantiate.
3DMP (review): coordinate-system and transform conventions. Extract: row- vs column-vector convention and why \(M_{world\to camera}=M_{camera\to world}^{-1}\).
(Linear maps, homogeneous transforms, and quaternions: assumed from Course 5.)

Key Concepts

Coordinate conventions, fixed now

Adopt a right-handed world frame and the ISO 8855 vehicle body frame (\(+x\) forward, \(+y\) left, \(+z\) up); never deviate. Represent rigid transforms as \(4\times4\) homogeneous matrices \(T=\begin{bmatrix}R&t\\0&1\end{bmatrix}\) and read composition right-to-left. Mismatched frame conventions are the single largest source of integration bugs in robotics codebases; one documented convention prevents most of them.

Fixed vs variable timestep

A naive loop ticks simulation by dt = now - last, which is non-reproducible (float accumulation differs run-to-run) and destabilizes integrators on frame spikes. The canonical accumulator pattern fixes it:

constexpr double kSimDt = 1.0 / 120.0;  // 120 Hz simulation
double accumulator = 0.0;
auto last = clock::now();
while (running) {
    auto now = clock::now();
    double frame = std::chrono::duration<double>(now - last).count();
    last = now;
    accumulator += std::min(frame, 0.25);   // clamp spiral-of-death
    while (accumulator >= kSimDt) { sim.step(kSimDt); accumulator -= kSimDt; }
    double alpha = accumulator / kSimDt;     // [0,1) render interpolation
    renderer.render(sim.state(), alpha);
}

Two runs with the same input sequence then produce identical simulation states regardless of frame rate.

Why separate sim and render

Rendering is non-deterministic in wall-clock terms (GPU scheduling, thermal state); letting it drive physics couples the world to that noise. Sensors need fixed cadences (a 30 Hz camera fires every 4 ticks at 120 Hz). Headless scenario runs (Week 9) must advance simulation as fast as possible with no rendering. The architecture must support all three without scattered conditionals.

Theory Exercises

Given \(T_{cam\to world}\), derive the view matrix and show the rigid-inverse shortcut \(T^{-1}=\begin{bmatrix}R^\top&-R^\top t\\0&1\end{bmatrix}\).
Explain model, world, view, and projection spaces and what each transform is responsible for.
Prove that composing transforms corresponds to changing coordinate frames.
Contrast fixed and variable timestep; explain why a semi-implicit Euler integrator destabilizes on a dt spike.
Explain why simulation and render updates must be separated, with a sensor-cadence example.

Implementation

Build the engine skeleton: a Window RAII wrapper over GLFW, a Logger, a latched input snapshot, and the accumulator loop above with std::chrono::steady_clock. Wrap math in GLM with GLM_FORCE_DEPTH_ZERO_TO_ONE and GLM_FORCE_RADIANS defined globally (Vulkan clip-space Z is \([0,1]\)). Add GoogleTest with transform/quaternion/accumulator unit tests, including associativity of transform composition to catch left/right-multiply errors.

Benchmark

Measure four numbers every frame in an ImGui overlay (stub now): total frame time, sim update time, render time, input latency. Use a ScopeTimer RAII helper; report mean + p99 over a sliding 1-second window. Disable V-sync (glfwSwapInterval(0)) during benchmarking.

Expected baselines: with a clear-color stub, total frame ~1 ms, sim update ~0.05 ms (empty world), input latency under ~2 ms. Determinism check: identical input sequences yield bitwise-identical sim state across runs.

Connections

This loop is the spine. Week 2 plugs Vulkan into the render slot; Week 3 feeds real meshes through the transform pipeline; Week 6’s vehicle physics is reproducible only because the timestep is fixed; Week 9’s scenario regression tests are deterministic only because the loop is. Course 5’s linear algebra is the math under the transform conventions established here.