Week 13 — Coding Projects

Core

Implement parallel reduction and study summation error.

  • NumPy: Compare left-fold, pairwise, and NumPy sum. Build adversarial floating-point examples where order matters.
  • Metal: Parallel reduction kernel for sum and max. Multi-stage reduction for large arrays. · Reading: MBT — compute optimization, threadgroup reductions, synchronization.
  • Vulkan: Compute reduction kernel with workgroup-local reductions and barriers. · Reading: Vulkan Book — workgroup-local reductions, barriers, multi-pass reduction structure.
  • CUDA: Standard reduction ladder with warp/block reduction. · Reading: CUDA Book — optimized reductions, warp/block reduction concepts, memory coalescing.
  • Stretch: Add max reduction and norm reduction. Compare float16/float32.
  • Verify: Summation order changes result for adversarial values · GPU reduction matches CPU reference within expected tolerance · GPU speedup appears for large enough arrays.