Blog

The Second Half

Shunyu Yao

2025.08.31

·Web·by Anonymous

#AI#RL#LLM#Reasoning#Evaluation

Key Points

1The first half of AI focused on developing novel training methods and models to hillclimb benchmarks, where method innovation was prioritized over task definition.
2A new "recipe" combining language pre-training, scale, and reasoning has made RL generalize across tasks, effectively standardizing benchmark-solving and diminishing the impact of incremental method improvements.
3The second half of AI demands a shift from solving problems to defining them, focusing on fundamentally rethinking evaluation setups to prioritize real-world utility and drive truly game-changing research.

\text{Language generalizes through reasoning in agents}

Blog

Shunyu Yao

2025.08.31

·Web·by Anonymous

#AI#RL#LLM#Reasoning#Evaluation

1The first half of AI focused on developing novel training methods and models to hillclimb benchmarks, where method innovation was prioritized over task definition.
2A new "recipe" combining language pre-training, scale, and reasoning has made RL generalize across tasks, effectively standardizing benchmark-solving and diminishing the impact of incremental method improvements.
3The second half of AI demands a shift from solving problems to defining them, focusing on fundamentally rethinking evaluation setups to prioritize real-world utility and drive truly game-changing research.

\text{Language generalizes through reasoning in agents}