Paper

Learning to Discover at Test Time

James Zou

2026.01.31

·Arxiv·by 네루

#LLM#Reinforcement Learning#Test-Time Training#AI Discovery#Open Model

Key Points

1TTT-Discover proposes a novel method that applies reinforcement learning at test time to continually train a Large Language Model (LLM) for scientific discovery, prioritizing the generation of a single, highly optimized solution rather than broad generalization.
2This approach employs an entropic objective to maximize rewards and a PUCT-inspired state reuse mechanism, enabling the LLM to adapt and learn from its own attempts to solve specific, out-of-distribution problems.
3TTT-Discover achieves new state-of-the-art results across diverse domains, including mathematics, GPU kernel engineering, and algorithm design, using an open model and demonstrating significant improvements over prior methods with cost-effective training.

d

Paper

James Zou

2026.01.31

·Arxiv·by 네루

#LLM#Reinforcement Learning#Test-Time Training#AI Discovery#Open Model

1TTT-Discover proposes a novel method that applies reinforcement learning at test time to continually train a Large Language Model (LLM) for scientific discovery, prioritizing the generation of a single, highly optimized solution rather than broad generalization.
2This approach employs an entropic objective to maximize rewards and a PUCT-inspired state reuse mechanism, enabling the LLM to adapt and learn from its own attempts to solve specific, out-of-distribution problems.
3TTT-Discover achieves new state-of-the-art results across diverse domains, including mathematics, GPU kernel engineering, and algorithm design, using an open model and demonstrating significant improvements over prior methods with cost-effective training.

d