Yersin
CPU BaselinesC++ · OpenMP- Wrote the single-threaded reference implementations for all three workloads
- Built the OpenMP parallel-for versions and tuned scheduling policies
- Set up the timing harness with warm-up runs and median reporting
We split the work along architectural boundaries — one person owned the CPU baselines, two owned the GPU implementations (naive and optimized), one owned profiling, and one built this site. Every benchmark number on this site was produced by the team and reviewed by at least two people.
A short, honest account of where AI assistance helped — and where it didn't.
Scaffolded the Next.js report site, drafted CUDA kernel skeletons, and helped explain warp-level reduction patterns while we were tuning.
In-editor refactors and tab-completion across the C++ baseline files. Used for renaming, splitting headers, and quick OpenMP pragma tweaks.
Every measurement reported on this site comes from code we wrote and ran ourselves. AI tools were used for scaffolding, explanation, and iteration — never for generating benchmark numbers.