Loading CPU model...
Loading GPU model...
Computer architecture · final project

CPUvsGPU

Two architectures, three workloads, one wall-clock.

cores8 P · 16 T
L336 MB
SMs170
VRAM32 GB
The serial side · CPU

Fewer cores, deeper pipelines.

The CPU side of every speedup number on this site. Same composition the GPU gets in the hero, mirrored — model on the left so the two chips face each other across the page.

Loading CPU model...

Built to chase the next instruction

  • Out-of-order, speculative, branch-prediction heavy.
  • Wins when the next computation depends on the last one.
  • Our OpenMP runs use all 16 threads; baseline uses one.
8/16cores / threads
5.4GHz boost
36 MBL3 cache
Stack

Tools we used

CUDAGPU kernels, shared memory, warp shuffles
OpenMPCPU parallel-for baselines on 8 cores
C++17Both CPU and GPU host code
Next.js 14App Router + TypeScript for this report
RechartsLog-log timing plots
Headline numbers

Key findings

The full breakdown is on the benchmarks page — these are the ones that surprised us.

98×
Mandelbrot speedup
Optimized GPU vs single-threaded CPU at 4096×4096
40×
Dot product speedup
Optimized GPU vs single-threaded CPU at 1B elements
32×
Heat equation speedup
Tiled GPU vs single-threaded CPU at 2048×2048, 1000 steps
Naive → optimized GPU
The reduction kernel gains the most from warp-level merging