Week 5 — Control Flow, Stack Frames, Function Calls, and the ABI
Overview
This week explains two things that are really one: how high-level control flow (if/loops/switch) becomes branches and condition codes, and what actually happens when a function is called — the stack frame, argument passing, register preservation, and return. It consolidates the control-flow and the stack/ABI weeks because they are inseparable: a function call is a controlled branch plus a stack discipline. The ABI (Application Binary Interface) is the contract that makes separately compiled code interoperate — which registers are arguments, which are callee-saved, how the stack is laid out — and understanding it is what lets you read any disassembly, debug a corrupted stack, and reason about recursion.
Building on Week 4’s instructions and flags, you will trace and write function-calling assembly, draw stack frames, and see recursion as nested frames. This is the layer where C “becomes” machine code, setting up Week 6’s data-layout view.
Readings
- ARM (Plantz) Ch. 13: control-flow constructs. Extract: how if/while/for/switch lower to compares and branches.
- ARM Ch. 11: inside the main function. Extract: the stack frame and prologue/epilogue.
- ARM Ch. 14–15: inside subfunctions and special uses. Extract: the AArch64 calling convention, argument/return registers, and callee/caller-saved rules.
- CA CPU control-flow sections: branching at the datapath level.
Key Concepts
Control flow as compare-and-branch
An if (a < b) becomes cmp a, b (set flags) followed by a conditional branch (b.lt, b.eq, …) that reads the flags from Week 4. Loops are a conditional branch back to a label; switch becomes a jump table or a compare chain. There is no structured control flow at the machine level — only branches — and seeing this makes the compiler’s lowering transparent. Branchless idioms (conditional select csel) avoid branches for performance/predictability.
The stack frame
The stack grows downward; each function call pushes a frame holding the return address, saved registers, and local variables, anchored by the frame pointer (x29) and stack pointer (sp). The prologue sets up the frame (save x29/x30/lr, adjust sp); the epilogue tears it down and returns (ret). Drawing the frame for a call chain is the core skill.
The AArch64 calling convention (ABI)
The contract: x0–x7 pass the first eight integer arguments and x0 returns the result; x30 (lr) holds the return address; x19–x28 are callee-saved (a function must preserve them), while x9–x15 are caller-saved (scratch). The stack must stay 16-byte aligned. This ABI is why a function compiled today links with a library compiled years ago — both honor the same register/stack contract.
x86-64 equivalent: the System V AMD64 ABI
x86-64 Linux/macOS use the System V AMD64 calling convention — the same idea as AAPCS64, with different registers and one important structural difference. Integer arguments go in rdi, rsi, rdx, rcx, r8, r9 (six, vs ARM64’s eight), the result returns in rax, and rbx, rbp, r12–r15 are callee-saved while the rest are caller-saved. Alignment is 16-byte, as on ARM64.
The key contrast worth internalizing: how the return address is handled. On ARM64 the bl instruction puts the return address in a register (x30/lr), so a leaf function need not touch memory to return; on x86-64 the call instruction pushes the return address onto the stack, and ret pops it. So an x86-64 prologue typically push %rbp; mov %rsp, %rbp and the return address already sits in the frame, whereas an ARM64 prologue explicitly saves x30. x86-64 also has the red zone (128 bytes below rsp that leaf functions may use without adjusting rsp) and uses push/pop as first-class stack instructions (ARM64 adjusts sp and uses stp/ldp pairs). Same contract concept, different mechanics — and recognizing both is what lets you read either disassembly.
LLVM IR and calling conventions
In LLVM IR a function call is just call i32 @f(i32 %x) — the ABI is not in the IR. The backend lowers that single instruction to the AAPCS64 or SysV AMD64 sequence depending on the target (llc -march=...). So the calling convention is precisely the kind of “ISA convention, not computation” detail the IR abstracts away: emit one IR for a call and watch llc produce the ARM64 (bl, args in x0–x7) versus x86-64 (call, args in rdi/rsi/…) forms.
Recursion and the stack
Recursion is just nested frames: each call gets its own frame with its own locals and return address, so factorial(5) is five stacked frames unwinding in order. Stack overflow (too-deep recursion) and the cost of frame setup follow directly. Tail calls can reuse the frame (an optimization).
Theory Exercises
- Lower an
if/else, awhileloop, and a smallswitchto compare-and-branch assembly; show where the flags are read. - Draw the stack frame (saved lr/fp, locals) for a function and write its prologue/epilogue in AArch64.
- State the AArch64 calling convention and the System V AMD64 convention side by side: argument registers (
x0–x7vsrdi/rsi/rdx/rcx/r8/r9), return register, callee-saved sets, and — most importantly — how each handles the return address (x30/lr vs pushed bycall). - Trace the stack through
factorial(3), drawing each frame at maximum depth and as it unwinds. - Show how a tail call can reuse the caller’s frame and why that bounds stack growth.
- Emit LLVM IR for a function that calls another, then lower it with
llcto both ISAs; identify where the single IRcallbecame the AAPCS64 vs SysV AMD64 sequence, and explain the red zone’s effect on the x86-64 leaf-function prologue.
Implementation
Write AArch64 functions that call other functions (including a recursive one) with correct prologue/epilogue and ABI compliance; assemble, run, and single-step them watching the stack pointer and frame. Compile equivalent C and compare the compiler’s prologue/epilogue and argument handling to yours. Then compile the same C to x86-64 (Godbolt or clang --target=x86_64-linux-gnu, run under qemu-user) and compare: where ARM64 saves x30, x86-64’s call has already pushed the return address; contrast the two prologues, argument registers, and the red zone. Emit the LLVM IR once and confirm the single call lowers to both conventions.
Measurement / Inspection
In the debugger, inspect the stack at maximum recursion depth and walk the frames (backtrace). Verify callee-saved registers are preserved across a call. Compare -O0 (explicit frames) vs -O2 (frame-pointer omission, inlining) for the same function; observe how optimization changes the stack discipline.
Expected baselines: hand-written calls obey the ABI and link with C code; the backtrace shows the expected frame chain; -O2 often omits frames and inlines, shrinking or eliminating the stack activity visible at -O0. Callee-saved registers survive calls; violating the ABI corrupts state visibly.
Connections
The ABI and stack discipline are what Week 6 builds on to explain where struct/array/object data lives, and what the capstone’s /proc stack inspection reads. Control-flow lowering connects to Week 3’s branch-on-flags and Week 10’s branch-prediction performance. This is the machine-level reality under every C/C++ program in Courses 1–2.
Further Reading
- ARM (Plantz) Ch. 11, 13–15; CS:APP (Bryant & O’Hallaron) Ch. 3 (control + procedures, x86-64).
- “Procedure Call Standard for the Arm 64-bit Architecture (AAPCS64)” and the “System V Application Binary Interface, AMD64 Processor Supplement” — the two authoritative ABIs.
- Godbolt Compiler Explorer — observing prologue/epilogue and calling conventions across both ISAs and in LLVM IR.