

High-Performance AI Inference: Systems, Caching, and Distributed Execution
This session dives into the low-level systems challenges behind running large language models efficiently in production. Rather than focusing on model architecture or prompts, the talks explore how inference performance is shaped by infrastructure decisions: cache design, memory movement, distributed execution, and latency optimization at scale.
Agenda
18:00 Doors open
18:30 - 20:00 TBD
20:00 - 21:00 Networking & Food & Drinks
21:00 Doors close
Organizers
Ilya Kulyatin: Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.
Supporters
Tokyo AI (TAI) is the biggest AI community in Japan, with 4,000+ members mainly based in Tokyo (engineers, researchers, investors, product managers, and corporate innovation managers).
Value Create is a management advisory and corporate value design firm offering services such as business consulting, education, corporate communications, and investment support to help companies and individuals unlock their full potential and drive sustainable growth.
Privacy Policy
We will process your email address for the purposes of event-related communications and ongoing newsletter communications. You may unsubscribe from the newsletter at any time. Further details on how we process personal data are available in our Privacy Policy.