

Voice AI Meetup
Join our next Voice AI Meetup, moderated by Kwindla from Pipecat with experts from OpenAI, Speechmatics, Tavus, and Daily.
Agenda:
Benchmark design, plus the latest in STT - Ricardo Herreros Symons, CSO Speechmatics; Sam Sykes, Director Innovation
Multimodal models and video agents use cases - Quinn Favret, Cofounder Tavus
Training OpenAI’s GPT Realtime speech-to-speech models - Bo Xie, OpenAI, Member of Technical Staff
Moderated by Kwindla. Plus demos, pizza/drinks, networking and conversations with fellow AI engineers, founders, investors, and teams.
Doors open 6:30p PT in SF. Demos and fireside chats start 7:15p (we'll share the livestream link around then). More networking at 8p. Office closes 9p.
On benchmarks
One of the top questions we get from the Pipecat community and ecosystem is how to evaluate models (and how we evaluate models).
Pipecat recently released benchmarks evaluating voice AI performance. Our first benchmarks tested LLMs and STT.
Benchmarks are hard to do well, and always are a simplification of reality, at best!
We’re sitting down with the STT lab Speechmatics (which maintains a Pipecat service) to talk about benchmarks. As our Pipecat team designed and compiled our benchmarks, we worked closely with the leading labs, including Speechmatics, to get their input, feedback, and perspective.
Kwindla will continue that conversation, with Ricardo Herreros Symons, Speechmatics CSO:
How "hard" should a benchmark be and what should the data mix be?
What should you really be testing? (Latency, turn detection, configurability, etc, what else?)
What data sets should you train on?
How do APIs and orchestration implementations matter?
What is the difference between reality and…marketing.
What could the next evolution of a benchmark be?
On multimodal AI
We’re also excited for a fireside chat with Quinn Favret, Tavus cofounder. Tavus has long been a leader in multimodal AI. Kwindla will talk with Quinn about the latest research and training realtime models from scratch. We'll hear from Quinn about growing agentic video use cases from startups to the enterprise.
On gpt-realtime-1.5
On Monday, OpenAI released a new speech-to-speech model. Bo Xie will sit down with us to talk about training gpt-realtime-1.5.
Your meetup hosts
Pipecat is the most widely used voice agents and multimodal AI framework. 100% open source. Vendor neutral.
Speechmatics provides advanced Voice AI technology, delivering real-time and batch transcription across 55+ languages. Their neural models are engineered for high accuracy across accents, noisy environments, and specialized domains, powering scalable speech intelligence for enterprise applications.
Tavus is an SF–based AI research lab pioneering human computing, teaching machines the art of being human. Build, scale, and customize lifelike AI video agents for your products and workflows.
Daily provides realtime voice, video, and AI infrastructure for developers. Its engineers maintain the open source Pipecat framework.