

Test-time Recursive Thinking:Self-Improvement without External Feedback
Speaker: Yufan Zhuang, PhD student at UCSD
Host: Haolun Wu, PhD student at Mila&McGill
“Models can do self-improving without training”
Key findings are counterintuitive: 🧐The Test-time Recursive Thinking (TRT) framework enables LLMs to self-improve reasoning through iterative knowledge accumulation, combining strategic rollout generation, self-verification-based solution selection, and contrastive failure analysis without external supervision. 😎TRT achieves significant accuracy gains: open-source models reach 100% on AIME benchmarks, while closed-source models improve by 10.4~14.8 percentage points on LiveCodeBench’s hardest problems through self-generated test execution and adaptive exploration strategies.
paper: https://arxiv.org/pdf/2602.03094