Visualizations

Why GPU parallelism feels different

Numbers tell you how much faster the GPU is. These animations show why. Each canvas below illustrates one piece of the architectural story — execution model, memory hierarchy, and reduction strategy.

WIPCanvas animation logic is being written separately. The DOM slots are wired up below — drop your render loop into the matching canvasRefs.current[i] inside a useEffect.

CPU lanes vs GPU warps

Thread execution model

Animate how 8 CPU threads chew through a workload sequentially compared to thousands of GPU threads running in lock-step warps.

800 × 400 canvas — animation hook ready

Cache vs shared memory

Memory hierarchy traversal

Visualize a stencil access pattern: every cell touches its neighbors. Show why a tiled kernel that loads neighbors into shared memory once beats one that re-reads them from global memory.

800 × 400 canvas — animation hook ready

Warp shuffles in action

Tree-style reduction

Animate a 32-element warp folding pairs of partial sums together — log2(32) = 5 steps to reduce a warp without touching shared memory.

800 × 400 canvas — animation hook ready