

AI Benchmarks for Human Flourishing
We benchmark AI for accuracy, speed, reasoning, code generation, but almost no one is benchmarking whether AI is actually good for the people using it.
Not "safe." Not "aligned." Good for them.
Erika Anderson and Yaoli Mao created HumaneBench, an open-source benchmark that evaluates whether AI systems treat users humanely, built on care ethics and tested in production. It was covered by TechCrunch, presented at MIT Media Lab, and is currently running live in production.
One of their key findings: when users are vulnerable or under pressure, model behavior flips 67% of the time. The principle that degrades first? Empowerment, the one most tied to genuine user agency.
Erika and Yaoli will walk through the landscape of human flourishing benchmarks, the philosophy and methodology behind HumaneBench, and what they've learned putting it into production. Then we'll turn it over to you: if you wanted to implement a well-being benchmark for your product, where would you start, and what would get in your way?
Doors at 6:45, talk starts at 7:00, Q&A to follow.
Presented by Rally SF, a community for builders in San Francisco.