Cover Image for Advancing the Frontier of Multilingual Multimodality
Cover Image for Advancing the Frontier of Multilingual Multimodality
Avatar for Lossfunk Event Calendar
Your friendly neighborhood AI lab
80 Went
Registration
Past Event
Welcome! Please choose your desired ticket type:
About Event

About the talk:
Aya Vision is a family of open-weight vision-language models that brings strong multilingual performance to multimodal inputs across 23 languages. The work introduces a synthetic annotation framework and a cross-modal model-merging method that preserves text-only skills and enhances multimodal generative performance. The result is Aya-Vision-8B and Aya-Vision-32B, which report best-in-class results for their sizes and competitive win rates against much larger models on generative benchmarks.
We’ll cover:
•⁠ ⁠Two-stage training for multilingual multimodal alignment across 23 languages
•⁠ ⁠Multilingual Multimodal data pipeline using synthetic instructions and translation/rewrites for quality
•⁠ ⁠Model merging to preserve strong text-only skills while improving image-grounded generation
•⁠ ⁠Benchmarks and results: AyaVisionBench and m-WildVision

About the Speaker:
Saurabh Dash is a researcher at Cohere Labs. His work focuses on multimodal models and efficiency. Prior to Cohere, he was a PhD student at Georgia Tech and research intern at Apple AI/ML.
Website: https://saurabhdash.com
X: https://x.com/TheyCallMeMr

Location
Indiranagar
Bengaluru, Karnataka, India
The exact location will be shared after the invite is accepted.
Avatar for Lossfunk Event Calendar
Your friendly neighborhood AI lab
80 Went