Why GPU parallelism feels different
Numbers tell you how much faster the GPU is. These animations show why. Each canvas below illustrates one piece of the architectural story — execution model, memory hierarchy, and reduction strategy.
WIPCanvas animation logic is being written separately. The DOM slots are wired up below — drop your render loop into the matching canvasRefs.current[i] inside a useEffect.
Thread execution model
Animate how 8 CPU threads chew through a workload sequentially compared to thousands of GPU threads running in lock-step warps.
Memory hierarchy traversal
Visualize a stencil access pattern: every cell touches its neighbors. Show why a tiled kernel that loads neighbors into shared memory once beats one that re-reads them from global memory.
Tree-style reduction
Animate a 32-element warp folding pairs of partial sums together — log2(32) = 5 steps to reduce a warp without touching shared memory.