What evaluation frameworks exist for AI, and what's their rationale?

Name: What evaluation frameworks exist for AI, and what's their rationale?
Start: 2026-02-22T18:00:00.000+00:00
End: 2026-02-22T20:00:00.000+00:00
Location: Newspeak House

Hosted by sugaroverflow, Alexandra Ciocanel & Edward Saperia

Newspeak House

London, England

Approval Required

Your registration is subject to host approval.

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

This session is part of the How to Think about Tech? The Case of 'AI Safety' study group initiated by some of the fellow candidates of the 2025/2026 Introduction to Political Technology course. It is open to faculty and fellowship candidates only.

Evaluations (or "evals") have become central to AI governance - companies use them to justify model releases, regulators require third-party assessments, and researchers design benchmarks for dangerous capabilities.

But who decides what gets measured? Whose values are embedded in evaluation design? And how do evals function in practice?

This study group session examines AI evaluation as both a technical practice and a political process, analyzing how "safety" gets operationalized through benchmarks, who holds power in defining risk, and what systematically gets excluded from evaluation frameworks.

---

Key Discussion Questions:

Who decides what capabilities or risks to evaluate?
How do evaluation frameworks shape what gets built and deployed?
What's the relationship between evals and actual safety?
Can we evaluate "societal impact"? What would that require?
How do evals function in governance?
What gets optimized when benchmarks become targets?
What's the gap between evaluation results and deployment decisions?

What evaluation frameworks exist for AI, and what's their rationale?

​Key Discussion Questions:

​Recommended Readings:

Key Discussion Questions:

Recommended Readings: