Cover Image for Designing and Operating Multimodal Dialogue AI Agents

Presented by

Hosted By

Designing and Operating Multimodal Dialogue AI Agents

Name: Designing and Operating Multimodal Dialogue AI Agents
Start: 2026-02-10T18:00:00.000+09:00
End: 2026-02-10T21:00:00.000+09:00
Location: Tokyo

Tokyo AI (TAI)

Tokyo

Approval Required

Your registration is subject to host approval.

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Human dialogue is shaped by multimodal signals such as gaze, facial expressions, prosody, and timing, yet many conversational AI systems still rely on limited modalities. This event features two talks on advancing multimodal dialogue agents, covering both interaction modeling and system-level quality assurance. Attendees will learn how non-verbal cues can enable more natural, adaptive interactions, and how frameworks from DevOps/MLOps can ensure reliability when deploying high-stakes conversational agents like InteLLA in real-world educational settings.

Who is this for?

This event is intended for researchers, engineers, and practitioners working in conversational AI, multimodal machine learning, human–computer interaction, and dialogue systems, as well as those involved in deploying AI in high-stakes or user-facing applications. It will be especially relevant to attendees interested in bridging foundational research with real-world system design, evaluation, and operations.

Agenda

18:00 Doors open

18:30 - 19:00 Beyond Words: Understanding subtle multimodal cues for AI agent interaction (Mao Saeki)

19:00 - 19:30 Towards Full-Duplex Dialogue Quality Assurance for High-Stakes Assessment Agents (Sadahiro Yoshikawa)

19:30 - 20:00 TBD

20:00 - 21:00 Networking

21:00 Doors close

Speakers:

Talk 1 - Beyond Words: Understanding subtle multimodal cues for AI agent interaction

Speaker: Mao Saeki (Research Scientist, Equmenopolis)

Abstract: Natural human conversation is shaped by subtle non-verbal signals that are largely overlooked by today’s dialogue systems—gaze shifts, head movements, prosodic patterns, and facial expressions. In this talk, I present a body of research on leveraging such multimodal cues to enable AI agents to interact in more human-like and engaging ways. I will cover three complementary directions: predicting conversational turn-taking using visual signals such as gaze and head motion; detecting user confusion from multimodal behavioral patterns to drive adaptive conversational strategies; and eliciting active user participation through incremental confirmation of user understanding. Together, these techniques underpin InteLLA, a multimodal dialogue agent deployed at scale, and demonstrate how fine-grained multimodal cue understanding can transform passive system interactions into collaborative, natural conversations.

Bio: Mao Saeki is a founding member and Research Scientist at Equmenopolis Inc., where he leads the development of InteLLA, a multimodal virtual agent for language proficiency assessment. He is currently pursuing a Ph.D. at Waseda University. His research focuses on multimodal conversational AI, particularly the understanding and generation of non-verbal signals—including gaze, facial expressions, and prosody—to achieve natural human-agent interaction.

Talk 2 - Towards Full-Duplex Dialogue Quality Assurance for High-Stakes Assessment Agents

Speaker: Sadahiro Yoshikawa (R&D Lead, Equmenopolis)

Abstract: Equmenopolis is a Waseda University spinout startup that researches, develops, and operates InteLLA, a conversational AI agent for assessing English speaking proficiency, used by schools and other educational institutions. This talk frames the challenges unique to such multimodal agents through the lens of DevOps and MLOps and shares practical lessons learned. It also outlines key requirements for high-stakes assessment agents and introduces parts of the research frameworks we use to meet them.

Bio: Sadahiro Yoshikawa is a Research and Development Group Lead at Equmenopolis, where he leads DialOps (Dialogue System Operations). Previously, he worked as a freelance Data Engineer. His research focuses on the interaction quality of multimodal dialogue systems from the perspective of interlocutors. He is particularly interested in developing frameworks and statistical methods for measuring and ensuring reliable dialogue quality.

Talk 3 - TBD

Speaker: TBD

Abstract: TBD

Bio: TBD

Organizers

Ilya Kulyatin: Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.

Supporters

Tokyo AI (TAI) is the biggest AI community in Japan, with 4,000+ members mainly based in Tokyo (engineers, researchers, investors, product managers, and corporate innovation managers).

Value Create is a management advisory and corporate value design firm offering services such as business consulting, education, corporate communications, and investment support to help companies and individuals unlock their full potential and drive sustainable growth.

Privacy Policy

We will process your email address for the purposes of event-related communications and ongoing newsletter communications. You may unsubscribe from the newsletter at any time. Further details on how we process personal data are available in our Privacy Policy.

Location

Please register to see the exact location of this event.

Tokyo

Presented by

Tokyo AI (TAI)

Hosted By

Designing and Operating Multimodal Dialogue AI Agents

​Who is this for?

​Agenda

​Speakers:

​Talk 1 - Beyond Words: Understanding subtle multimodal cues for AI agent interaction

​Talk 2 - Towards Full-Duplex Dialogue Quality Assurance for High-Stakes Assessment Agents

​Talk 3 - TBD

​Organizers

​Supporters

​​Privacy Policy