Cover Image for What evaluation frameworks exist for AI, and what's their rationale?
Cover Image for What evaluation frameworks exist for AI, and what's their rationale?
14 Went

What evaluation frameworks exist for AI, and what's their rationale?

Hosted by sugaroverflow, Alexandra Ciocanel & Edward Saperia
Registration
Past Event
Welcome! To join the event, please register below.
About Event

This session is part of the How to Think about Tech? The Case of 'AI Safety' study group initiated by some of the fellow candidates of the 2025/2026 Introduction to Political Technology course. It is open to faculty and fellowship candidates only.

Evaluations (or "evals") have become central to AI governance - companies use them to justify model releases, regulators require third-party assessments, and researchers design benchmarks for dangerous capabilities.

But who decides what gets measured? Whose values are embedded in evaluation design? And how do evals function in practice?

​This study group session examines AI evaluation as both a technical practice and a political process, analyzing how "safety" gets operationalized through benchmarks, who holds power in defining risk, and what systematically gets excluded from evaluation frameworks.

---

Key Discussion Questions:

  • Who decides what capabilities or risks to evaluate?

  • How do evaluation frameworks shape what gets built and deployed?

  • What's the relationship between evals and actual safety?

  • Can we evaluate "societal impact"? What would that require?

  • How do evals function in governance?

  • What gets optimized when benchmarks become targets?

  • What's the gap between evaluation results and deployment decisions?

Recommended Readings (from the study group):

From the News

Introduction to AI Evals

Listen/Watch

Critical Perspectives

Technical Approaches (for reference)

Technical Deep Dives (optional)

Please read at least one of the linked items to ensure a good study group discussion, thank you!

Location
Newspeak House
133 Bethnal Grn Rd, London E2 7DG, UK
Lounge
14 Went