Voice AI Pronunciation Challenges

Name: Voice AI Pronunciation Challenges
Start: 2025-09-03T18:30:00.000-07:00
End: 2025-09-03T21:00:00.000-07:00
Location: San Francisco, California

Hosted by Klaudia Guzij & 5 others

Register to See Address

San Francisco, California

Past Event

Please click on the button below to join the waitlist. You will be notified if additional spots become available.

You will be asked to verify token ownership with your wallet.

About Event

Join us for a deep dive into the challenges with pronunciation when deploying voice AI applications in production:

Why TTS models struggle to pronounce certain words like Chipotle or Acetaminophen
What are phonemes?
Custom pronunciation without needing to learn the international phonetic alphabet (IPA) or brute force spellings
Scaling challenges
Switching to a different TTS provider while in production
Tradeoffs between emotional range and pronunciation consistency
Misleading latency promises and tactics to reduce latency
New tools to catch pronunciation failures before your customers notice
How to set up useful metrics that matter beyond TTFB
War stories from the Voice AI trenches

If you're actively building and scaling Voice AI applications, this event is for you!

Agenda:

6:30 – 7:30: Lightning Talks
- Rime: [To be announced]
- ConverseNow: The Latency Mirage: Designing for Perception, Not Milliseconds.
- Coval: Newest discoveries of benchmarking over 12 different TTS providers & what you should know before picking yours!
- Daily: [To be announced]
7:00 – 7:30: Panel Discussion on how to scale with reliable TTS performance:
7:30 - 8:30: Networking

About the Hosts

Rime builds speech AI infrastructure and blends advanced ML with deep linguistic and sociolinguistic insight to create ultra-realistic multilingual voices that breathe, laugh, and carry the subtle rhythms of real everyday speech.

Coval builds the reliability infrastructure for Voice AI agents, where QA, product, and engineering teams collaborate to test, evaluate, and monitor voice agents at scale.

ConverseNow builds voice AI for restaurants. ConverseNow delivers a branded and fully customizable experience for each guest to improve customer satisfaction, drive frequency of spend, and give restaurants control through AI Your Way.

Daily is the maintainer of Pipecat, an open source voice orchestration framework for building real-time conversational AI. Pipecat provides the low-latency infrastructure to connect speech recognition, LLMs, and text-to-speech into seamless pipelines, enabling developers to create natural, responsive, and reliable voice experiences.

Vapi provides the developer infrastructure for building and scaling voice-enabled applications. With a simple API, teams can integrate speech recognition, LLMs, and text-to-speech into production-ready voice experiences. Vapi handles the heavy lifting of telephony, latency optimization, and session management, so developers can focus on creating reliable, natural conversations rather than plumbing.

Location

Please register to see the exact location of this event.

San Francisco, California

Hosted By

180 Went

AI

Voice AI Pronunciation Challenges

​About the Hosts

About the Hosts