Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)
Hosted By

Liquid AI Showcase: Frontier Speech Systems, Post-Training RL, and Edge VLMs

Register to See Address
Shibuya, Japan
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Description

The Tokyo AI (TAI) community, in collaboration with Liquid AI, presents a technical deep dive into frontier multimodal architectures and post-training methodologies. This session features technical presentations from Liquid AI scientists and engineers covering the systems-level scaling and evaluation of native, real-time speech-to-speech models, the post-training paradigms driving state-of-the-art Japanese Small Language Models (SLMs), and optimization strategies for deploying highly specialized, compact Vision-Language Models (VLMs) for structured data extraction. Attendees will gain direct insight into the foundational data, algorithmic, and engineering challenges of building high-performance, localized AI models.


​Agenda

The agenda is ordered by architectural complexity and modality progression. We begin with text-based foundational methodologies (specifically LLM post-training paradigms and SLM optimization), establishing core architectural baselines. We then transition to multimodal processing with vision-language models (VLMs) focused on specialized tasks. We conclude with the highly complex, multi-component domain of native speech-to-speech systems, which require deep synchronization across speech models, specialized evaluation frameworks, and distinct localized data engineering challenges.

  • 18:00 - 18:30 | Doors Open

  • 18:30 - 18:55 | Talk 1: LLM Post-Training Trends & an Introduction to a State-of-the-Art Japanese SLM (Kohsei Matsutani)

  • 18:55 - 19:20 | Talk 2: Structured Information Extraction with Small VLMs (Hongkuan Zhang)

  • 19:20 - 20:00 | Combo Talk 3: Building Speech-to-Speech AI: Models, Evals, and Data for Spoken Interaction (Marc Härkönen, Samuel J. Broughton, Masao Taketani)

  • 20:00 - 21:00 | Technical Networking

  • 21:00 | Doors Close

Speakers

Talk 1 - LLM Post-Training Trends & an Introduction to a State-of-the-Art Japanese SLM

Abstract: In this talk, we will overview current research trends, ongoing discussions, and open questions in LLM post-training. We will cover learning algorithms such as supervised fine-tuning, reinforcement learning, and on-policy distillation, as well as challenges in training reasoning models. We will also introduce LFM2.5-1.2B-JP-202606, a state-of-the-art Japanese small language model (SLM), and present an overview of its training process.

Bio: Kohsei Matsutani is a Member of Technical Staff on the Liquid AI Japan team. He is also a student at the University of Tokyo, where he conducts research on LLMs.

Talk 2 - Structured Information Extraction with Small VLMs

Abstract: In this talk, we focus on building small and specialized VLMs for a specific structured output task: given an image and a list of fields you want extracted from it, return a JSON object with those fields filled in. We share our motivation and pros/cons on developing these small VLMs, and walk through our case study with 1.6B and 450M models, fine-tuned on synthetically generated training data, which reach competitive performance against 4B open-source generalists. We will also introduce the usage and features of these two released models.

Bio: Hongkuan Zhang is a Member of Technical Staff at Liquid AI Japan, where he works on post-training for VLMs. He received his Ph.D. from Nagoya University, with research focusing on multimodal learning and video understanding for autonomous driving videos. Previous worked in Bosch Japan as ADAS function developer.

Combo Talk 3 - Building Speech-to-Speech AI: Models, Evals, and Data for Spoken Interaction

Abstract: Audio language models are reshaping how machines understand and generate spoken interaction, pushing beyond transcription toward increasingly natural, real-time speech-to-speech systems. This talk will trace the progress behind that shift, beginning with how recent training strategies and architectures are enabling new capabilities while also introducing difficult systems challenges at scale. We will then turn to evaluation, where familiar audio tasks are giving way to more holistic measures of interactive spoken communication, raising new questions about reliability, usability, and practical deployment. Finally, we will examine the data foundations that make these systems possible, with a particular focus on the distinctive challenges of working with Japanese speech data.

LT 1 Speaker: Marc Härkönen is a machine learning scientist at Liquid AI, where he works on speech-language and multimodal models. He was previously a research scientist at Fano Labs, focusing on speech AI, and earlier held a postdoctoral position at the Max Planck Institute for Mathematics in the Sciences after completing his PhD at Georgia Tech. His background spans speech systems, machine learning, and algebraic geometry, shaping his perspective on both the research foundations and practical challenges behind modern audio language models.

LT 2 Speaker: Samuel J. Broughton is a Member of Technical Staff at Liquid AI, where he works on audio and multimodal evaluation systems for spoken AI. He was previously a Senior Machine Learning Engineer at Fano Labs, where he focused on serving speech AI at scale and designing audio systems for SaaS deployment. His research has centered on end-to-end speaker diarization, with several publications at INTERSPEECH demonstrating state-of-the-art performance in neural diarization systems, as well as broader work on speech processing and real-time conversational AI.

LT 3 Speaker: Masao Taketani is a senior ML engineer at Liquid AI, where he specializes in Japanese audio models. Previously, he worked as a Deep Learning Research Engineer focusing on generative AI R&D-including diffusion models, GANs, VAEs, LLMs, VLMs, and agentic AI-as well as recognition AI across computer vision and NLP. He earned an MS in Computer Science from the University of Tokyo, where his research focused on simulation environment generation.

​Organizers

Ilya Kulyatin is an entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan. He holds a BA in Economics, an MA in Finance, and an MSc in Machine Learning. He's a 3x founder, now helping Japan grow the local AI ecosystem through a not-for-profit community, Tokyo AI (TAI), while building an AI-native system integrator and solutions provider, Foundry Labs株式会社.

​Supporters

Tokyo AI (​​​TAI) is the biggest AI community in Japan, with 4,000+ members mainly based in Tokyo (engineers, researchers, investors, product managers, and corporate innovation managers).

​​Privacy Policy

We will process your email address for the purposes of event-related communications and ongoing newsletter communications. You may unsubscribe from the newsletter at any time. Further details on how we process personal data are available in our Privacy Policy.

Location
Please register to see the exact location of this event.
Shibuya, Japan
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)
Hosted By