Cover Image for AI Evals w/ Alex Gu β€” Evaluating AI Systems on Mathematical and Coding Tasks
Cover Image for AI Evals w/ Alex Gu β€” Evaluating AI Systems on Mathematical and Coding Tasks

AI Evals w/ Alex Gu β€” Evaluating AI Systems on Mathematical and Coding Tasks

Hosted by Vals AI & alphaXiv
Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

​About Event

β€‹β€‹β€‹β€‹β€‹β€‹β€‹πŸ”¬ AI Evals on AlphaXiv

β€‹πŸ—“ Wednesday, November 5th 2025 Β· 11AM PT

β€‹πŸŽ™ Featuring Alex Gu

β€‹πŸ’¬ Moderated Discussion + Q&A

​AI Evals Series: Evaluating AI Systems on Mathematical and Coding Tasks

​We’re excited to host Alex Gu, a PhD student at MIT whose research focuses on evaluating and improving AI systems on programming and both formal and informal mathematical reasoning. Alex has been involved in creating widely-used benchmarks and tools, such as LiveCodeBench, BigCodeBench, CRUXEval, LeanDojo, IneqMath, and more. In this session, Alex will share insights on how evaluations can inform our perspective on AI capabilities and explore today's challenges of AI models on math and code tasks.

​This event is virtural. The zoom link will be shared upon registration. The talk will later be uploaded to AlphaXiv’s YouTube Channel

​Hosted by: alphaXiv x Vals AI

​​​​​​​AI Evals: join the community