Cover Image for TAI AAI #13 - Embodied AI: From Seeing to Imagining to Doing
Cover Image for TAI AAI #13 - Embodied AI: From Seeing to Imagining to Doing
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)

TAI AAI #13 - Embodied AI: From Seeing to Imagining to Doing

Register to See Address
Bunkyo City, Tokyo
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

How Modern Robots Connect Perception to Action: Seeing, Imagining, and Doing

From self-driving cars that explain their choices to robots that plan and act in the physical world, the frontier of embodied AI is where perception meets purposeful action. This event explores how modern robots and intelligent agents bridge the gap between understanding the world and acting within it—linking vision, language, and behavior into a unified system of intelligence.

We’ll follow a simple but powerful arc:

🔹 Seeing (VLMs): language being an interface between humans and embodied AI (Roland Meertens).

🔹 Imagining (World Model): world model being a predictive “world representation or embedding” of the physical world (Alisher Abdulkhaev).

🔹 Doing (VLAs): mapping vision-language inputs into actionable skills and policies (Motonari Kambara)

Want to get insight about the embodied AI from conceptual introductions all the way to technical discussions?

Agenda

​​​18:00 - 18:30 Doors Open
18:30 - 18:40 Introduction
18:40 - 19:10 Talk 1
19:10 - 19:40 Talk 2
19:40 - 20:10 Talk 3
20:10 - 21:00 Networking

Speakers

Talk 1: Seeing (VLMs): language being an interface between humans and embodied AI

Speaker: Roland Meertens (ML Engineer, Wayve)

Abstract: Understanding what your car wants to do. It's one thing to build a vehicle that drives itself autonomously through the streets of Tokyo; it's a different thing to also understand why it drives itself the way it does. You will learn what end-to-end self-driving cars are, and how you can make this car explain which decisions it takes. Last but not least, we will also see if we can use language to probe what the car would do in hypothetical scenarios.

Bio: Roland is working as a machine learning engineer for Wayve in London. This year, he helped set up the first operations of Wayve in Japan and set up the Wayve driver on the Nissan Ariya. He is also good at baking pizza.

Talk 2: Imagining (World Model): world model being a predictive “world representation or embedding” of the physical world

Speaker: Alisher Abdulkhaev (Co-founder, Kanaria Tech)

Abstract: The world model is a predictive “world representation or embedding” of the physical world that lets AI models comprehend the world state and imagine future states of the world. In his talk, Alisher (CTO & CoFounder, Kanaria Tech) will touch on the essential concepts in world modelling, including how the world modelling handles the uncertainties and plans ahead rather than reacting moment to moment.

Bio: Alisher Abdulkhaev is the Co-Founder and CTO of Kanaria Tech, where he develops the Kanaria Robotic Model (KRM), a world model-driven foundation model for social navigation in autonomous mobile robots. His work focuses on bridging embodied intelligence, world modeling, and goal-directed reasoning to enable robots to navigate and interact naturally in complex real-world environments. Alisher frequently shares insights on robotics, AI systems, and startup building through his writings on Medium and thoughts on X.

Talk 3: Doing (VLAs): mapping vision-language inputs into actionable skills and policies

Speaker: Motonari Kambara (JSPS Research Fellow, Keio University)

Abstract: This talk introduces the current capabilities and future directions of Vision-Language-Action (VLA) models that integrate perception, reasoning, and control for embodied intelligence. I will discuss how vision, language, and actions serve as complementary features enabling grounded understanding and purposeful behavior. The talk also highlights explainability—how VLAs enhance transparency and interpretability by aligning visual and linguistic representations of a robot’s reasoning, bridging the gap between autonomous control and human understanding.

Bio: Motonari Kambara is a JSPS Research Fellow at Keio University. He received his B.E., M.S., and Ph.D. in Engineering from Keio University in 2021, 2023, and 2025, respectively. From 2023 to 2025, he has also been a research fellow at JSPS (DC1). His research interests include vision and language, as well as robot learning.

​​​​​​Tokyo AI (TAI) information

​​​TAI is the biggest AI community in Japan, with 2,700+ members mainly based in Tokyo (engineers, researchers, investors, product managers, and corporate innovation managers). Web: https://www.tokyoai.jp/

​​​​​​​​Event Supporters

​​​DEEPCORE is a VC firm supporting AI Salon Tokyo. They operate a fund for seed and early-stage startups and KERNEL, a community supporting early entrepreneurs.

​Hosts

Alisher Abdulkhaev: Alisher Abdulkhaev is the Co-Founder and CTO of Kanaria Tech, where he develops the Kanaria Robotic Model (KRM), a world model-driven foundation model for social navigation in autonomous mobile robots.

​​​Ilya Kulyatin: Fintech and AI entrepreneur with work and academic experience in the US, Netherlands, Singapore, UK, and Japan, with an MSc in Machine Learning from UCL.

Location
Please register to see the exact location of this event.
Bunkyo City, Tokyo
Avatar for Tokyo AI (TAI)
Presented by
Tokyo AI (TAI)