Paper

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Weiqiang Lou

2026.03.03

·Arxiv·by 이호민

#Agent#CUDA#Kernel Generation#LLM#Reinforcement Learning

Key Points

1CUDA Agent introduces a large-scale agentic reinforcement learning system specifically designed to significantly improve large language models' capabilities in generating high-performance CUDA kernels.
2This system achieves its goals through a scalable data synthesis pipeline, a skill-augmented CUDA development environment with automated verification and profiling, and novel RL algorithmic techniques that ensure stable training.
3CUDA Agent achieves state-of-the-art results on KernelBench, consistently outperforming torch.compile and surpassing leading proprietary models, especially on more complex tasks, by learning sophisticated optimization strategies.1. 🚀 CUDA Agent introduces a large-scale agentic reinforcement learning system designed to significantly enhance LLMs' ability to generate high-performance CUDA kernels.
4This system is built upon a scalable data synthesis pipeline, a skill-augmented CUDA development environment with robust feedback, and algorithmic improvements for stable multi-turn RL training.
5CUDA Agent achieves state-of-the-art performance on KernelBench, delivering substantial speedups over torch.compile and outperforming leading proprietary models across all difficulty levels.

r \in \{-1, 1, 2, 3\}

Paper

Weiqiang Lou

2026.03.03

·Arxiv·by 이호민

#Agent#CUDA#Kernel Generation#LLM#Reinforcement Learning

1CUDA Agent introduces a large-scale agentic reinforcement learning system specifically designed to significantly improve large language models' capabilities in generating high-performance CUDA kernels.
2This system achieves its goals through a scalable data synthesis pipeline, a skill-augmented CUDA development environment with automated verification and profiling, and novel RL algorithmic techniques that ensure stable training.
3CUDA Agent achieves state-of-the-art results on KernelBench, consistently outperforming torch.compile and surpassing leading proprietary models, especially on more complex tasks, by learning sophisticated optimization strategies.1. 🚀 CUDA Agent introduces a large-scale agentic reinforcement learning system designed to significantly enhance LLMs' ability to generate high-performance CUDA kernels.
4This system is built upon a scalable data synthesis pipeline, a skill-augmented CUDA development environment with robust feedback, and algorithmic improvements for stable multi-turn RL training.
5CUDA Agent achieves state-of-the-art performance on KernelBench, delivering substantial speedups over torch.compile and outperforming leading proprietary models across all difficulty levels.

r \in \{-1, 1, 2, 3\}