Week 6 — C/C++ Memory Layout: Structs, Arrays, Pointers, VTables, Objects

Overview

This week connects C/C++ abstractions to their concrete memory layout and machine behavior — the payoff of the previous weeks. You now understand bytes (Week 1), the ABI and stack (Week 5), and assembly (Week 4); here you see exactly how a struct is laid out with padding, how arrays and pointers index memory, how this and virtual dispatch work via vtables, and why data layout (array-of-structs vs structure-of-arrays) dominates cache performance. This is the knowledge that separates engineers who write fast, correct systems code from those who treat C++ as magic.

The layout reasoning here is reused immediately: Week 8’s cache experiments are about exactly this, and Course 2’s data-oriented design and Course 1’s tensor layouts are applications. By the end you can predict the byte-level layout of any aggregate and explain a virtual call’s machine-level cost.

Readings

ARM (Plantz) Ch. 16–18: bitwise logic, data structures, and object-oriented programming. Extract: struct layout, pointer arithmetic, and how objects/vtables look in memory.
HLW Ch. 15: development tools. Extract: the toolchain (compiler, linker, objdump, nm) for inspecting layout.
HLW Ch. 16: compiling software from C source. Extract: translation units, linking, and symbols.

Key Concepts

Struct layout, alignment, and padding

A struct’s members are laid out in declaration order, each at an offset satisfying its alignment (Week 1), with padding inserted between members and at the end so the struct’s size is a multiple of its largest member’s alignment. Reordering members to minimize padding can shrink a struct significantly — a real memory-footprint lever. sizeof reflects padding; offsetof gives member offsets. This is why two structs with the same fields in different orders can have different sizes.

Arrays and pointer arithmetic

An array is contiguous storage; a[i] is *(a + i), and pointer arithmetic scales by sizeof(element). A pointer is just an address (Week 1) carrying a type that governs its arithmetic and dereference width. Multidimensional arrays are row-major contiguous; understanding this is the basis for stride and cache behavior (Week 8).

Objects, `this`, and vtables

A C++ object with non-virtual methods is just a struct; the method receives a hidden this pointer (Week 5 ABI: first argument — x0 on ARM64, rdi on x86-64). Virtual functions add a hidden vtable pointer to the object; a virtual call dereferences the vtable pointer, loads the function address from the right slot, and calls it — an extra indirection per call (and a barrier to inlining). Seeing this in the disassembly demystifies polymorphism and its cost.

Layout is almost entirely ABI-driven and near-identical across the two ISAs: ARM64 and x86-64 Linux/macOS are both LP64 with the same type sizes and alignment rules (Itanium C++ ABI for the vtable mechanism), so a struct’s sizeof/offsetof and an object’s vtable layout match on both. The only differences surface in the instructions of the virtual call (ldr/blr vs mov/call through the vtable slot) and the this-passing register. Emitting the LLVM IR for a virtual call shows the getelementptr + indirect call once, target-independently — the indirection is in the IR, while the register choice is the backend’s.

Data layout and cache (AoS vs SoA)

The same data as an array-of-structs (each element’s fields together) vs structure-of-arrays (each field in its own array) has very different cache behavior: a loop touching one field streams contiguously in SoA but strides over padding/other fields in AoS. This is the single most important practical layout decision for performance and the direct setup for Week 8’s cache experiments and Course 2’s agent storage.

Theory Exercises

Compute the size and member offsets of a struct with mixed-width members; reorder it to minimize padding and recompute.
Show that a[i] equals *(a + i) and how pointer arithmetic scales by element size; give a row-major 2D indexing formula.
Draw the memory layout of a C++ object with a vtable; trace the instructions of a virtual call (Week 4/5).
Contrast AoS vs SoA cache behavior for a loop summing one field over many elements; predict which is faster and why.
Explain how the linker resolves a symbol across two translation units (Week 6 reading) and what an unresolved-symbol error means.

Implementation

Write programs that print sizeof/offsetof for several structs (including a padding-minimized reorder), demonstrate pointer arithmetic and row-major 2D indexing, and expose a vtable layout (print the object’s bytes, find the vtable pointer). Implement the same computation over AoS and SoA data. Inspect everything with objdump/nm and the Week 1 byte dumper.

Measurement / Inspection

Measure struct sizes before/after member reordering. Disassemble a virtual call and count the extra instructions vs a direct call. Benchmark the AoS vs SoA field-sum loop over a large array and measure the speedup (this directly previews Week 8’s cache work).

Expected baselines: reordering members reduces struct size; a virtual call shows the extra load+indirect-branch over a direct call; the SoA field-sum is meaningfully faster than AoS for large arrays — the cache-locality win quantified, and the foundation for Week 8.

Connections

This layout reasoning is exactly what Week 8 measures with stride experiments, and what Course 2’s data-oriented agent storage and Course 1’s tensor layouts apply for performance. The vtable mechanism builds on Week 5’s ABI; struct padding builds on Week 1’s alignment. Reading layout with the toolchain is a daily systems-engineering skill.