Personal

Louka Ewington-Pitsos

a paper reading club but we deep dive into benchmarks

This session: ProgramBench, on which the best SOTA model (gpt-5.5) scores just 1/200. It's growing very rapidly in GitHub stars 

https://www.star-history.com/?repos=facebookresearch%2FProgramBench%2Charbor-framework%2Fterminal-bench&type=date&legend=top-left

I'm currently training a model on this benchmark so should have some interesting insights from this as well!

Pre-reading: spend 15 minutes skimming the highly digestible  ProgramBench paper: