Team

Five people, one report

We split the work along architectural boundaries — one person owned the CPU baselines, two owned the GPU implementations (naive and optimized), one owned profiling, and one built this site. Every benchmark number on this site was produced by the team and reviewed by at least two people.

Contributors

Who did what

Yersin

CPU BaselinesC++ · OpenMP
  • Wrote the single-threaded reference implementations for all three workloads
  • Built the OpenMP parallel-for versions and tuned scheduling policies
  • Set up the timing harness with warm-up runs and median reporting

Kirill

Basic CUDA KernelsCUDA · C++
  • Authored the naive GPU kernels for Mandelbrot, dot product, and the heat stencil
  • Set up host↔device memory transfer and grid/block size selection
  • Verified correctness against the CPU baselines down to floating-point tolerance

Alan

GPU OptimizationsCUDA · Shared memory
  • Rewrote the dot product with warp shuffles and a shared-memory tree reduction
  • Implemented shared-memory tiling for the heat-equation stencil
  • Tuned block sizes and occupancy with Nsight Compute

Zhanbolat

Profiling & BenchmarkingNsight · perf
  • Built the sweep script that runs every implementation across the size grid
  • Profiled hotspots and produced the roofline charts for each kernel
  • Validated GPU timings with CUDA events vs wall-clock to confirm the methodology

Kairat

Frontend & VisualizationNext.js · TypeScript · Recharts
  • Built this site (Next.js 14 App Router + SASS modules)
  • Designed the dark technical theme and log-log chart components
  • Wrote the report copy and structured the crossover analysis
Disclosure

AI tools we used

A short, honest account of where AI assistance helped — and where it didn't.

Claude Code

Scaffolded the Next.js report site, drafted CUDA kernel skeletons, and helped explain warp-level reduction patterns while we were tuning.

Cursor

In-editor refactors and tab-completion across the C++ baseline files. Used for renaming, splitting headers, and quick OpenMP pragma tweaks.

Every measurement reported on this site comes from code we wrote and ran ourselves. AI tools were used for scaffolding, explanation, and iteration — never for generating benchmark numbers.