Team

Five people, one report

We split the work along architectural boundaries — one person owned the CPU baselines, two owned the GPU implementations (naive and optimized), one owned profiling, and one built this site. Every benchmark number on this site was produced by the team and reviewed by at least two people.

Contributors

Who did what

Yersin

CPU BaselinesC++ · OpenMP

Wrote the single-threaded reference implementations for all three workloads
Built the OpenMP parallel-for versions and tuned scheduling policies
Set up the timing harness with warm-up runs and median reporting

Kirill

Basic CUDA KernelsCUDA · C++

Authored the naive GPU kernels for Mandelbrot, dot product, and the heat stencil
Set up host↔device memory transfer and grid/block size selection
Verified correctness against the CPU baselines down to floating-point tolerance

Alan

GPU OptimizationsCUDA · Shared memory

Rewrote the dot product with warp shuffles and a shared-memory tree reduction
Implemented shared-memory tiling for the heat-equation stencil
Tuned block sizes and occupancy with Nsight Compute

Zhanbolat

Profiling & BenchmarkingNsight · perf

Built the sweep script that runs every implementation across the size grid
Profiled hotspots and produced the roofline charts for each kernel
Validated GPU timings with CUDA events vs wall-clock to confirm the methodology

Kairat

Frontend & VisualizationNext.js · TypeScript · Recharts

Built this site (Next.js 14 App Router + SASS modules)
Designed the dark technical theme and log-log chart components
Wrote the report copy and structured the crossover analysis

Disclosure

AI tools we used

A short, honest account of where AI assistance helped — and where it didn't.

Claude Code

Scaffolded the Next.js report site, drafted CUDA kernel skeletons, and helped explain warp-level reduction patterns while we were tuning.

Cursor

In-editor refactors and tab-completion across the C++ baseline files. Used for renaming, splitting headers, and quick OpenMP pragma tweaks.

Every measurement reported on this site comes from code we wrote and ran ourselves. AI tools were used for scaffolding, explanation, and iteration — never for generating benchmark numbers.