Cover Image for From Reasoning to Real Time: Infrastructure for Modern Multimodal AI
Cover Image for From Reasoning to Real Time: Infrastructure for Modern Multimodal AI
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)
Hosted By

From Reasoning to Real Time: Infrastructure for Modern Multimodal AI

Register to See Address
Bunkyo City, Tokyo
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Summary

As AI systems grow more agentic, multimodal, and context-intensive, the constraints of latency, throughput, and memory efficiency increasingly define what is feasible to deploy. Long-horizon deep agents generate extensive reasoning traces, voice models require real-time responsiveness to enable fluid interaction, and emerging architectures push context lengths to unprecedented scales, all straining even the most advanced inference and training infrastructure.

This event brings together three perspectives on that challenge. The first talk examines how accelerating deep-agent pipelines transforms complex, multi-step execution from slow prototypes into production-grade systems. The second explores the system requirements behind emotionally expressive, interactive voice AI and demonstrates how low-latency pipelines unlock natural, multi-turn dialogue. The third focuses on the design principles required to train models at extreme context lengths, outlining the algorithmic and system-level choices that make such scaling tractable. Together, these talks offer a comprehensive view of how advances in speed, efficiency, and scalable architecture are reshaping what ML engineers can build today.

Schedule

18:00 Doors open

18:30 - 19:00 Going Deeper, Going Faster: How SambaNova Unlocks the Potential of Deep Agents (Kwasi Ankomah)

19:00 - 19:30 Next-Gen Emotional Voice AI in Real Time with Hume AI and SambaNova (Masahiko Nakano)

19:30 - 20:00 Design Principles for Training at Extreme Context Lengths (Stefano Massaroli)

20:00 - 21:00 Networking

21:00 Event ends

Talks

Talk 1: Going Deeper, Going Faster: How SambaNova Unlocks the Potential of Deep Agents

Speaker: Kwasi Ankomah

Abstract: The AI industry is moving beyond simple tool-calling agents toward "deep agents"—systems capable of complex planning, multi-step execution, and sustained work across extended time horizons. Utilised by applications like Claude Code, Manus, and Deep Research, deep agents combine planning tools, sub-agent orchestration, file system access, and sophisticated prompts to tackle tasks that shallow agents simply cannot handle. But deep agents are token-hungry. They spawn sub-agents, maintain context across sessions, and generate extensive reasoning chains—making inference speed and efficiency critical bottlenecks. This talk explores how SambaNova's blazing-fast and efficient compute platform transforms what's possible with deep agents. We'll cover practical patterns for building production-grade deep agents and demonstrate how ultra-fast inference turns theoretical capabilities into real-world applications.

Bio: Kwasi Ankomah is a Lead AI Architect at SambaNova Systems, where he leads solution efforts on generative AI, large language models, and agentic AI applications. With a background spanning the UK’s Financial Conduct Authority and the consulting and financial sector, he brings deep expertise in applying AI to complex, regulated domains. He holds an MS in Data Science. Kwasi is passionate about AI leadership, diversity in tech, and responsible AI development. He’s a recognized voice in the AI infrastructure space, having appeared on podcasts like “The Neuron” and spoken at events including the AI Summit London, where he discusses why inference speed is the hidden bottleneck in scaling AI agents.

Talk 2: Next-Gen Emotional Voice AI in Real Time with Hume AI and SambaNova

Speaker: Masahiko Nakano

Abstract: Voice is becoming a key interface for next-generation AI, enabling more natural and emotionally expressive interactions. Hume AI, a New York–based startup, is leading this shift with two advanced speech models: Octave, an emotionally rich text-to-speech system, and EVI, a high-fidelity speech-to-speech model that transforms vocal style while preserving intent. These models support multilingual scenarios, including Japanese, and open new possibilities for enterprise applications. At the same time, voice AI faces a common challenge: the need for real-time, low-latency performance to support interactive, multi-turn voice agents. This talk will show how SambaNova’s accelerated platform helps enable these requirements and unlocks the full potential of Hume AI’s models, with a live demo of expressive, real-time voice generation.

Bio: Dr. Masahiko Nakano is a Principal Solutions Engineer at SambaNova, supporting Japanese enterprises in adopting advanced AI systems. He previously worked on digital transformation in the Japanese chemical industry and has a background in quantum computational chemistry.

Talk 3: Design Principles for Training at Extreme Context Lengths

Speaker: Stefano Massaroli

Abstract: Training models with extreme context lengths poses fundamental system-level challenges in memory, computation, and numerical stability. This talk examines the core design principles necessary to support such scaling, focusing on how to decompose the problem through appropriate abstractions and how to identify and mitigate critical algorithmic and systems bottlenecks. We explore the key axes along which these challenges can be addressed—including data movement, memory hierarchy, and parallelism strategies, and how these, in turn, shape the choice of computational primitives and architectures—and discuss emerging methods that make sequence length scaling practically tractable.

Bio: Stefano Massaroli is the Co-founder and President of Radical Numerics Inc. and a Research Scientist at RIKEN’s Deep Learning Theory Team in Tokyo. Previously, he was a Founding Scientist at Liquid AI, where he led the launch and growth of Liquid AI Japan, the company’s first subsidiary, from inception. He also completed a postdoctoral fellowship at Mila, advised by Yoshua Bengio. Stefano co-invented hybrid convolution language models and helped pioneer neural differential equations. He earned his Master’s and PhD from the University of Tokyo.

Supporters

​​​​​​​Tokyo AI (TAI) information

Tokyo AI (​​​TAI) is the biggest AI community in Japan, with 2,400+ members mainly based in Tokyo (engineers, researchers, investors, product managers, and corporate innovation managers).

​​​​​​​​DEEPCORE information

​​DEEPCORE is a VC firm supporting AI Salon Tokyo. They operate a fund for seed and early-stage startups and KERNEL, a community supporting early entrepreneurs.

Location
Please register to see the exact location of this event.
Bunkyo City, Tokyo
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)
Hosted By