Cover Image for The Future of Multimodal Intelligence and Foundation Models
Cover Image for The Future of Multimodal Intelligence and Foundation Models
Avatar for RETHINK
Presented by
RETHINK
Curated experiences for builders shaping the future of AI.

The Future of Multimodal Intelligence and Foundation Models

Register to See Address
New York
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

Join us for an intimate, curated dinner in NYC bringing together Research and Applied Scientists working at the frontier of AI.

This evening will center on a critical shift: the rise of audio-visual and speech-native foundation models and what they unlock beyond text-first systems.

While many AI systems still rely on speech → text → reasoning → speech pipelines, this abstraction strips away key human signals like tone, emotion, and intent. A new generation of models is emerging treating speech as a first-class modality and enabling more natural, real-time, and context-aware interaction.

Moderated by Kurtis Voris, Senior Applied Scientist at Zillow, the conversation will focus on:

  • The evolution of audio-visual and speech native models and multimodal alignment

  • Where current approaches break down in real-world systems

  • The challenges of scaling modern AI research into production

Guests include

  • Desh Raj - Senior Research Scientist at NVIDIA

  • Rodrigo Mira - Senior Research Scientist at Google DeepMind 

If you’re building in multimodal LLMs, speech/audio, or production AI systems, and want a high-signal, technically grounded conversation, we’d love to have you join us.


​Request to attend

Request an invitation today. Space is limited.
The event is free, but application is required. Once you have registered, our team will follow up to confirm attendance by email.


Who is in the room ​

  • Senior Research Scientists

  • Senior Applied Scientists

  • Senior ML Engineers working on multimodal LLMs, speech/audio, or production AI systems

  • This is a thoughtfully curated group, small size brought together to create space for depth, focus, and meaningful conversation.

What to expect

  • Panel discussion and a unique dining experience.

  • ​Cocktails and a 3 course meal (authentic Italian food) courtesy of our partners at Zillow. 

  • ​Meaningful discussion and real connections with Research Scientists peers facing similar challenges like yours.


Agenda

All event times are listed in PM EDT.

  • 5:00 - 5:45 | Networking + Cocktails & Appetizers

  • 5:45 - 6:45 | Panel Discussion

  • 6:45 - 8:30 | Dinner 

  • 8:30 - 9:30 | Networking & Close


About the Speakers

Desh Raj

Desh is a Senior Research Scientist at NVIDIA, where he works on multimodal AI and speech capabilities for large language models to build the next generation of conversational AI systems. Before NVIDIA, he was at Meta Superintelligence Labs, where he contributed to the first production-grade full-duplex voice agent. He has authored more than 40 research papers with over 1,500 citations, and has been recognized through the JHU-Amazon AI2AI Fellowship, the Fred Jelinek Fellowship, and IEEE Rising Star honors in Signal Processing.

Rodrigo Mira 

Rodrigo is a Senior Research Scientist at Google DeepMind in New York City, where he works on improving auditory, visual, and audio-visual speech understanding and generation. Before that, he was as a Postdoctoral Researcher at Meta, where he worked on speech-driven facial animation/lip synchronization. He completed his BSc at Instituto Superior Tecnico in 2017, and his MSc and PhD at Imperial College in 2018 and 2023, respectively. 

Kurtis Voris

Kurtis is a Senior Applied Scientist on the Foundational AI team at Zillow, where he develops next event prediction transformers and efficient language models for personalization. He has spent years working across voice AI, embeddings, search, recommendations, and language modeling at organizations including real estate start-ups, AWS, Amazon Alexa, and now Zillow.

Kurtis holds a B.S. in Statistics from Cal Poly San Luis Obispo and an M.S. from San Diego State University, where he specialized in research involving time series and natural language processing. When he is not optimizing GPU job runs, he is likely practicing Olympic lifting, exploring trails in the Sierra Nevada, or managing a small farm in the foothills with his three children. 


Thank you to our partner for this event

Zillow is advancing its vision of a “housing super app” by investing in AI and research to transform how people buy, sell, rent and finance homes.

Moving beyond search, Zillow is building intelligent, agentic systems that guide users through the full journey, from dreaming to closing, within a single integrated platform. By embedding AI into real transaction workflows, Zillow connects fragmented steps into a seamless experience, turning AI from a tool for answers into one that drives real outcomes so that someday starts today.


✨ By RSVPing to this event, you agree that your registration details (including your name and email address) may be shared with the event organizers and partners for the purposes of event coordination and follow-up communication.

Location
Please register to see the exact location of this event.
New York
Avatar for RETHINK
Presented by
RETHINK
Curated experiences for builders shaping the future of AI.