Cover Image for AI Eval Salon: Beyond Benchmarks
Cover Image for AI Eval Salon: Beyond Benchmarks
23 Went

AI Eval Salon: Beyond Benchmarks

Hosted by Naomi Xia & 3 others
Register to See Address
San Francisco, California
Registration
Past Event
Welcome! To join the event, please register below.
About Event

AI systems are gaining new capabilities fast, and our evaluation methods are expanding just as quickly.

Across research labs and production teams, people are experimenting with new ways to measure generalization, reasoning, reliability, and honesty. But the field is still fragmented, and many promising ideas aren’t being shared across groups.

This salon brings together researchers, AI-native product builders, and eval tool vendors for a focused working session on what’s actually proving useful in practice.

We’ll explore questions like:

  • What does a good eval look like for modern LLMs and agents?

  • How should we measure reasoning, planning, and internal cognition?

  • Which eval setups actually predict production reliability?

  • How do synthetic, human, and scenario-based evals work together?

Expect lightning talks, real lessons from the field, and open discussion.


Hosted by MoE Labs and NEA. Attendance capped at 20 people to keep it technical and conversational.

Location
Please register to see the exact location of this event.
San Francisco, California
23 Went