Week 1 — Data Representation and Arithmetic: Bits, Hex, Endianness, Two’s Complement, IEEE 754

Course 4 syllabus

Overview

Every higher abstraction in computing — a Python dictionary, a Vulkan vertex buffer, a TCP segment, a CUDA tensor — eventually becomes a specific arrangement of bits in memory, and every number it holds has a finite, fallible representation. This week makes both concrete. The first half is representation: how a uint32_t becomes four bytes, why those bytes appear in a particular order (endianness), what hex buys you, and how alignment forces certain addresses. The second half is numeric semantics: two’s complement and the integer overflow/signedness bugs it produces, and IEEE 754 floating point with its rounding and error behavior. This consolidates what were two separate weeks, because reading a hex dump and understanding what the number means are one skill.

This is the foundation for everything. The CPU emulator (Week 3) and ARM64 assembly (Week 4) assume you read register/memory contents in hex without translation; struct layout (Week 6) is this week’s byte-thinking extended to aggregates; the capstone decodes /proc byte-by-byte. The floating-point error analysis builds directly on Course 5’s numerical-computing treatment — here we ground it in the bit pattern.

Readings

  • ARM (Plantz) Ch. 1–3: the stage, bit/byte/word storage formats, and computer arithmetic. Extract: word vs byte addressing, byte ordering, two’s complement, and overflow flags.
  • CA (Fox) Ch. 2: data representation — positional notation, integers, fixed point, floating point. Extract: the formal numeric view and conversion procedures.
  • ARM Ch. 19 skim: fractional numbers. · HLW Ch. 1: the big picture (where user space ends and the kernel begins).
  • (Floating-point rounding, ULP, and error accumulation: builds on Course 5 numerical computing.)

Key Concepts

Positional notation and hex

A number in base \(b\) written \(d_n\dots d_1 d_0\) means \(\sum_i d_i b^i\). Hex dominates because each digit packs exactly 4 bits, so two hex digits = one byte: 0xDEADBEEF is instantly DE AD BE EF, while decimal 3735928559 tells you nothing structural. Every debugger and memory dump defaults to hex for this reason.

Endianness and alignment

For v = 0x12345678 at address 0x1000: little-endian puts the low byte first (78 56 34 12), big-endian the high byte first. Both of this course’s ISAs are little-endian — ARM64 on Apple Silicon/Linux and x86-64 alike — so dumps look the same on either; network byte order is big-endian (hence htonl/htons). Alignment: an N-byte type prefers an address divisible by N; the compiler inserts padding to enforce it (revisited in Week 6).

Two’s complement and overflow

Signed integers use two’s complement: negate by inverting bits and adding 1, so there is one zero and the range is asymmetric (\(-2^{n-1}\) to \(2^{n-1}-1\)). Addition/subtraction use the same hardware as unsigned; only the interpretation and the overflow condition differ (signed overflow = carry into vs out of the sign bit disagree). Signedness bugs (comparing signed and unsigned, shifting a negative) are a classic source of security holes.

IEEE 754 floating point

A float is sign · mantissa · \(2^{\text{exponent}}\): fp32 is 1-8-23, fp64 is 1-11-52, with a biased exponent, an implicit leading 1 (normalized), plus subnormals, signed zeros, infinities, and NaN. Every operation can round by up to half a ULP, and these errors accumulate — the failure mode Course 5 analyzes. Non-associativity (\((a+b)+c \ne a+(b+c)\)) and catastrophic cancellation are the practical consequences; the default rounding is round-to-nearest-even.

Theory Exercises

  1. Convert 42, 255, 256, 1000, 65535 between decimal, binary, and hex by hand.
  2. Store 0x12345678 little- and big-endian; decode the dump 78 56 34 12 as unsigned int, signed int, and ASCII.
  3. Compute two’s-complement negation for several values; identify the overflow condition for signed addition and give a buggy signed/unsigned comparison.
  4. Decode an fp32 bit pattern to its value and back; find the largest integer exactly representable in fp32.
  5. Construct a sum where floating-point non-associativity changes the result; quantify the error in ULPs (Course 5 link).

Implementation

Write a dump_bytes utility (hex + ASCII, like xxd) and round-trip integers/floats through it. Implement two’s-complement add/sub with overflow detection, and an fp32 encoder/decoder that prints sign/exponent/mantissa and the reconstructed value. Verify against the language’s native types.

Measurement / Inspection

Use the tool to inspect the byte layout of uint8/16/32/64, a negative int32, and several floats (including a subnormal, ∞, NaN). Demonstrate endianness by reinterpreting the same bytes at different widths. Show a concrete non-associative floating-point sum and an integer-overflow wraparound.

Expected baselines: byte dumps match the platform’s little-endian layout; two’s-complement arithmetic matches native results including the overflow flag; the fp32 decoder reproduces values exactly; the non-associativity demo produces a measurable, explainable discrepancy.

Connections

Byte- and hex-fluency is assumed by the CPU emulator (Week 3), ARM64 assembly (Week 4), and the capstone’s /proc decoding (Week 10). Two’s complement reappears in the ALU of Week 2/3; IEEE 754 connects to Course 5’s numerical analysis and to Course 1’s mixed-precision work. Alignment is the seed of Week 6’s struct layout.

Further Reading

  • ARM (Plantz) Ch. 1–3, 19; CA (Fox) Ch. 2.
  • What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg).
  • Course 5 numerical-computing notes (conditioning, ULP, stability).