

The Future of Multimodal AI: From Video Ads to Voice Cloning
Multimodal AI is everywhere now. Type text/URL → get a video ad. Record 10 seconds → clone any voice. Upload a product photo → generate a complete marketing campaign.
But here's the brutal truth: Most multimodal AI startups have incredible tech and no idea who actually needs it. They're solutions desperately searching for problems.
Between the demo and shipping to millions of users? That's where things get messy. And that's where having cool technology stops being enough.
Four founders who've actually done it will share what breaks, what scales, and what matters.
No theory. No fluff. Just real stories from the trenches.
Meet the Founders
Mahi de Silva - Co-Founder & CSO, Higgsfield AI($100m ARR in 6 months)
Yinan (Steven) Na – Co-Founder & CEO, Creatify AI(#1 AI Video Ad Platform)
Turning text into video ads at scale. $15.5M Series A from top VCs. $9M ARR. Ex-Snap & Meta engineering leader.
Emmie Chang – Co-Founder @ Yuzu Labs | YC W14 | Serial Entrepreneur
YC-backed founder. Has built (and failed) enough AI products to know what actually matters.
Rissa Cao — Co-Founder & CEO, Fish Audio & 39 AI | Ex-PixAI /Meta/Amazon
Building full-stack audio infrastructure for next-gen entertainment and AGI. The Audio Foundation Lab behind Fish Audio—ultra-real, emotionally expressive voices at <150ms latency, powered by the top open-source TTS team (100K+ stars; #1 audio repo). Grew from open-source roots to ~$7.5M ARR, proving world-class voice infrastructure can be both breakthrough in quality and a fraction of ElevenLabs/Cartesia’s cost.
Moderated by Cloris L - Data Science @ Meta|AI Community Builder
What You'll Learn
✨ The real cost of shipping multimodal AI (spoiler: it's not just compute)
🔧 Production war stories – what breaks when you hit scale
💰 How to build a business when models change every 3 months
🚀 Distribution secrets – getting users in a crowded market
🔮 What's next – where the biggest opportunities still are
Join us to learn the hard-won lessons of deploying multimodal AI and get a look at where the industry is heading.
Event Details
📅 Date: Dec 5th
🕐 Time:4:30-7:30pm
📍 Location: Stanford
Event Schedule:
4:30 PM - 5:00 PM: Networking and Dinner
5:00 PM -5:30 PM: Warm up and Presentation from Wan
5:30 PM - 6:15 PM: Panel and Q&A
6:15 PM onwards: Socializing, and Continued Conversations
Partner
Wan – An AI platform that turns text and images into animated short-form videos in seconds, lowering the barrier to creative work with AI.
Meet your Host!
FounderCoHo is a community where founders support each other by sharing knowledge and building connections. Starting a company is a brave endeavor. Many of us have faced numerous challenges and learned valuable lessons along the way. The founder's journey can feel lonely, as it's difficult to discuss its nuances with those who haven't experienced it firsthand. FounderCoho aims to be that supportive community where founders can share their stories and get fueled for their company-building journey.
Where to find us:
YouTube
Substack