

The Future of Multimodal Intelligence and Foundation Models
Join us for an intimate, curated dinner in NYC bringing together Research and Applied Scientists working at the frontier of AI.
This evening will center on a critical shift: the rise of audio-visual and speech-native foundation models and what they unlock beyond text-first systems.
While many AI systems still rely on speech → text → reasoning → speech pipelines, this abstraction strips away key human signals like tone, emotion, and intent. A new generation of models is emerging treating speech as a first-class modality and enabling more natural, real-time, and context-aware interaction.
Moderated by Kurtis Voris, Senior Applied Scientist at Zillow, the conversation will focus on:
The evolution of audio-visual and speech native models and multimodal alignment
Where current approaches break down in real-world systems
The challenges of scaling modern AI research into production
Guests include
Desh Raj - Senior Research Scientist at NVIDIA
Rodrigo Mira - Senior Research Scientist at Google DeepMind
If you’re building in multimodal LLMs, speech/audio, or production AI systems, and want a high-signal, technically grounded conversation, we’d love to have you join us.
Request to attend
Request an invitation today. Space is limited.
The event is free, but application is required. Once you have registered, our team will follow up to confirm attendance by email.
Who is in the room
Senior Research Scientists
Senior Applied Scientists
Senior ML Engineers working on multimodal LLMs, speech/audio, or production AI systems
This is a thoughtfully curated group, small size brought together to create space for depth, focus, and meaningful conversation.
What to expect
Panel discussion and a unique dining experience.
Cocktails and a 3 course meal (authentic Italian food) courtesy of our partners at Zillow.
Meaningful discussion and real connections with Research Scientists peers facing similar challenges like yours.
Agenda
All event times are listed in PM EDT.
5:00 - 5:45 | Networking + Cocktails & Appetizers
5:45 - 6:45 | Panel Discussion
6:45 - 8:30 | Dinner
8:30 - 9:30 | Networking & Close
About the Speakers
Desh Raj
Desh is a Senior Research Scientist at NVIDIA, where he works on multimodal AI and speech capabilities for large language models to build the next generation of conversational AI systems. Before NVIDIA, he was at Meta Superintelligence Labs, where he contributed to the first production-grade full-duplex voice agent. He has authored more than 40 research papers with over 1,500 citations, and has been recognized through the JHU-Amazon AI2AI Fellowship, the Fred Jelinek Fellowship, and IEEE Rising Star honors in Signal Processing.
Rodrigo Mira
Rodrigo is a Senior Research Scientist at Google DeepMind in New York City, where he works on improving auditory, visual, and audio-visual speech understanding and generation. Before that, he was as a Postdoctoral Researcher at Meta, where he worked on speech-driven facial animation/lip synchronization. He completed his BSc at Instituto Superior Tecnico in 2017, and his MSc and PhD at Imperial College in 2018 and 2023, respectively.
Kurtis Voris
Kurtis is a Senior Applied Scientist on the Foundational AI team at Zillow, where he develops next event prediction transformers and efficient language models for personalization. He has spent years working across voice AI, embeddings, search, recommendations, and language modeling at organizations including real estate start-ups, AWS, Amazon Alexa, and now Zillow.
Kurtis holds a B.S. in Statistics from Cal Poly San Luis Obispo and an M.S. from San Diego State University, where he specialized in research involving time series and natural language processing. When he is not optimizing GPU job runs, he is likely practicing Olympic lifting, exploring trails in the Sierra Nevada, or managing a small farm in the foothills with his three children.
Thank you to our partner for this event
Zillow is advancing its vision of a “housing super app” by investing in AI and research to transform how people buy, sell, rent and finance homes.
Moving beyond search, Zillow is building intelligent, agentic systems that guide users through the full journey, from dreaming to closing, within a single integrated platform. By embedding AI into real transaction workflows, Zillow connects fragmented steps into a seamless experience, turning AI from a tool for answers into one that drives real outcomes so that someday starts today.
✨ By RSVPing to this event, you agree that your registration details (including your name and email address) may be shared with the event organizers and partners for the purposes of event coordination and follow-up communication.