AI Eval Salon: Beyond Benchmarks

Name: AI Eval Salon: Beyond Benchmarks
Start: 2025-12-18T19:00:00.000-08:00
End: 2025-12-18T21:00:00.000-08:00
Location: San Francisco, California

Hosted by Naomi Xia & 3 others

Register to See Address

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

AI systems are gaining new capabilities fast, and our evaluation methods are expanding just as quickly.

Across research labs and production teams, people are experimenting with new ways to measure generalization, reasoning, reliability, and honesty. But the field is still fragmented, and many promising ideas aren’t being shared across groups.

This salon brings together researchers, AI-native product builders, and eval tool vendors for a focused working session on what’s actually proving useful in practice.

We’ll explore questions like:

What does a good eval look like for modern LLMs and agents?
How should we measure reasoning, planning, and internal cognition?
Which eval setups actually predict production reliability?
How do synthetic, human, and scenario-based evals work together?

Expect lightning talks, real lessons from the field, and open discussion.

Hosted by MoE Labs and NEA. Attendance capped at 20 people to keep it technical and conversational.

Location

Please register to see the exact location of this event.

San Francisco, California

Hosted By

23 Went

AI