

90/30 Club (ML reading) #40: Strategist: Self-Improving LLM Decision-Making via Bi-Level Tree Search
Week 40: Strategist: Self-Improving LLM Decision-Making via Bi-Level Tree Search
The Paper Link Here
Strategist introduces a framework for helping LLM agents improve their decision-making through structured self-play, reflection, and hierarchical search. The system uses simulated trajectories, Monte Carlo tree search, and LLM-generated feedback to iteratively refine reusable strategy representations, allowing models to improve performance without human demonstrations or fine-tuning.The authors show that Strategist can outperform both traditional reinforcement learning methods and existing LLM-based improvement techniques in complex multi-agent environments like Game of Pure Strategy and Resistance: Avalon, suggesting a scalable path toward self-improving agent systems.
⭐⭐⭐ We’re excited that Jonathan, the paper’s author, will join us to present the work and discuss its implications for agent learning and LLM self-improvement.
Join us at Mox to explore:
- How bi-level tree search enables strategy-level learning
- Why self-play and trajectory reflection improve agent performance
Discussion at 20:00, (optional) quiet reading from 19:00.