Paper

rStar2-Agent: Agentic Reasoning Technical Report

Weijiang Xu

2025.09.14

·Arxiv·by Anonymous

#Agent#Reinforcement Learning#LLM#Reasoning#Python

Key Points

1rStar2-Agent introduces a 14B math reasoning model trained with agentic reinforcement learning, enabling advanced cognitive behaviors like careful tool use and reflection on code execution feedback.
2This capability is powered by three innovations: an efficient RL infrastructure with a reliable Python environment, GRPO-RoC (Group Relative Policy Optimization with Resampling on Correct) to manage environment noise, and an efficient multi-stage training recipe starting with non-reasoning SFT.
3rStar2-Agent-14B achieves state-of-the-art math reasoning, scoring 80.6% on AIME24 and 69.8% on AIME25, outperforming significantly larger models like DeepSeek-R1 (671B) with minimal compute and strong generalization.

T

Paper

Weijiang Xu

2025.09.14

·Arxiv·by Anonymous

#Agent#Reinforcement Learning#LLM#Reasoning#Python

1rStar2-Agent introduces a 14B math reasoning model trained with agentic reinforcement learning, enabling advanced cognitive behaviors like careful tool use and reflection on code execution feedback.
2This capability is powered by three innovations: an efficient RL infrastructure with a reliable Python environment, GRPO-RoC (Group Relative Policy Optimization with Resampling on Correct) to manage environment noise, and an efficient multi-stage training recipe starting with non-reasoning SFT.
3rStar2-Agent-14B achieves state-of-the-art math reasoning, scoring 80.6% on AIME24 and 69.8% on AIME25, outperforming significantly larger models like DeepSeek-R1 (671B) with minimal compute and strong generalization.

T