

Systems Reading Group - AI for Performance Engineering: KernelBench & SWE-fficiency
If you're excited about the future of AI writing performant software, this is the session for you!
As AI agents have shown surprising coding and software engineering capabilities (i.e. 78% on SWE-Bench), we explore the next frontier of software development, namely performance engineering. Beyond the “what” of software implementation, we care deeply about the “how,” including code quality, maintainability, and, in particular, performance.
Performance is critical behind every piece of software, from GPU kernels to data analytics. Yet this is challenging as the AI agent has to optimize both for correctness and performance. In this session, you will learn about two lines of work:
GPU kernel generation (KernelBench and follow-up works)
Repository-level software engineering (SWE-fficiency)
We will walk through evaluation of current model capabilities and promising approaches from test-time scaling to reinforcement learning.
GPU Kernel Generation [KernelBench]
Led by Simon Guo, CS PhD Student at Stanford
KernelBench Blog: https://scalingintelligence.stanford.edu/blogs/kernelbench/
KernelBench Paper: https://arxiv.org/abs/2502.10517
Extensions: techniques that try to hillclimb KernelBench
Overview: https://simonguo.tech/blog/2025-10-automated-gpu-kernels.html
Search: Surprisingly Fast AI-Generated Kernels We Didn’t Mean to Publish (Yet), by Anne Ouyang https://crfm.stanford.edu/2025/05/28/fast-kernels.html
RL: Kevin: Multi-Turn RL for Generating CUDA Kernels https://arxiv.org/abs/2507.11948
Repository-Level Software Performance Optimization [SWE-fficiency]
Led by Jeffrey Ma, CS PhD Student at Harvard
SWE-fficiency Website: https://swefficiency.com/
SWE-fficiency Paper: https://www.arxiv.org/abs/2511.06090