Cover Image for AI Safety Evals Reading Group
Cover Image for AI Safety Evals Reading Group
Avatar for BlueDot Impact Events
We’re building the workforce needed to safely navigate AGI. Contact: [email protected]

AI Safety Evals Reading Group

Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

We are reading:

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

https://arxiv.org/abs/2406.07358

Every week, someone will present for up to 20 minutes followed by 40 minutes of discussion. RSVP to join, check our schedule, volunteer to present, pick one paper from our suggested list or propose your own.

Avatar for BlueDot Impact Events
We’re building the workforce needed to safely navigate AGI. Contact: [email protected]