Week 13 — Coding Projects
Core
Implement parallel reduction and study summation error.
- NumPy: Compare left-fold, pairwise, and NumPy sum. Build adversarial floating-point examples where order matters.
- Metal: Parallel reduction kernel for sum and max. Multi-stage reduction for large arrays. · Reading: MBT — compute optimization, threadgroup reductions, synchronization.
- Vulkan: Compute reduction kernel with workgroup-local reductions and barriers. · Reading: Vulkan Book — workgroup-local reductions, barriers, multi-pass reduction structure.
- CUDA: Standard reduction ladder with warp/block reduction. · Reading: CUDA Book — optimized reductions, warp/block reduction concepts, memory coalescing.
- Stretch: Add max reduction and norm reduction. Compare float16/float32.
- Verify: Summation order changes result for adversarial values · GPU reduction matches CPU reference within expected tolerance · GPU speedup appears for large enough arrays.