

AI Evals for Government
AI evaluations have become a critical foundation for the responsible use of AI in government, but existing approaches are struggling to keep up. From questions of validity and robustness to the gap between benchmarks and real-world public sector needs, AI evals are under increasing pressure.
In this event, co-hosted by the Centre for Digital Governance at the Hertie School, GovTech Deutschland, and the Weizenbaum Institute, researchers and practitioners explore the practical role AI evaluations take in government today, what’s going wrong with today’s AI evaluation practices, and how they can evolve to meet the strict requirements of the public sector.
Schedule
Doors open: 18:00
Program start: 18:30
Presentation: Möve: An LLM benchmark for the German Public Sector – Thilo Michael, Senior Data Scientist at Bundesdruckerei
Fireside Chat: AI Evals in Crisis? With Kenneth Enevoldsen – Researcher and First Author of the "Massive Multilingual Text Embedding Benchmark"
Q&A & Discussion
Program end: 20:00