Cover Image for AI Evaluations - a technical primer workshop
Cover Image for AI Evaluations - a technical primer workshop
1 Going

AI Evaluations - a technical primer workshop

Hosted by sugaroverflow & Edward Saperia
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

This workshop is part of the How to Think about Tech? The Case of 'AI Safety' study group initiated by some of the fellow candidates of the 2025/2026 Introduction to Political Technology course. It is open to faculty and fellowship candidates only - external guests by request.

AI companies release new models constantly, often claiming each new one is "safer" than the last. But how do they actually know? What tests do they run? What gets measured and evaluated?

This workshop provides a technical primer on AI evaluations (evals) - the practice of testing AI systems for capabilities, safety, and fairness. We'll explore:

  • What evaluations are and why they matter for AI governance

  • Different types of evaluations: capability, alignment, safety, and societal impact assessments

  • How evals work technically: constructing datasets, defining metrics, and designing evaluation protocols

  • Examples that demonstrate the mechanics in practice

  • (Optional) A deep dive on a real evaluation framework to see how it works in deployment decisions

We'll close with a hands-on activity: designing evaluations for a specific scenario to surface the tradeoffs between what we can measure and what actually matters.

This workshop pairs with the AI Safety study group discussion on Feb 22 on AI Evals (https://luma.com/357o5glx) examining the governance and political implications of evaluation frameworks.

Location
Newspeak House
133 Bethnal Grn Rd, London E2 7DG, UK
Classroom
1 Going