


Beyond Benchmarks: Ethics and AI Models in the Real World
​This is a cross post of https://www.meetup.com/dc-nlp/events/311125097/ - no need to sign up both places. 🙂
​Join us for our Responsible AI talk with Professor Patrick Hall! Big thanks to DC-NLP for organizing an MCing and Prefect for providing, space, food, and drinks!
​Agenda:
​6:00 - 6:30 PM - Welcome and mingle
6:30- 6:45 PM - Introductions
6:45 - 7:30 PM - Talk
7:30 - 8:00 PM - Wrap up
​Description:
​Benchmarks are useful, but it’s common sense that they can’t tell us how AI behaves in the real world. Worse, they encourage proxy games and number-chasing (Goodhart’s Law) and can be distorted by task or data contamination. What truly matters are in-situ outcomes and failure modes—privacy leaks, biased or unsafe behavior, misinformation cascades—which static leaderboards rarely reveal.
​The answer isn’t to abandon benchmarks—they're too valuable for developers—but to extend measurement beyond them: combine model-centric tests with structured red-teaming and user-driven field-testing, then apply context-aware measurement instruments to judge real impact.
​This talk unpacks the limitations of benchmarks and evals, and offers constructive steps to move past proxy games toward evidence of whether systems work safely, fairly, and reliably where it counts—the real world.