GenAI Cracow #28 - Multi-modality and local systems
Agenda
Opening Ceremony
Native Multimodality: Beyond Language-Centric Models by Jakub
TBD by Dharm
Q&A
Quiz
Networking
Abstract
Native Multimodality: Beyond Language-Centric Multimodal Models
Traditional multimodal pipelines compromise performance by forcing audio and visual data into a compressed text-token space, permanently losing spatial structure, temporal flow, and fine-grained detail. To resolve this bottleneck, cutting-edge systems—including Kimi K2.5, SenseTime’s NEO/NEO-unify architecture, and the Gemini 1.5+ series—have converged on native multimodality, retaining raw structural context or eliminating the encoder-projector pipeline entirely. While native architectures drastically reduce data requirements and make cross-modal reasoning less brittle, they also introduce complex training dynamics and shift failure modes rather than eliminating them. Drawing on six months of empirical research, this talk evaluates where native multimodality fundamentally alters performance, outlines its persistent failure modes, and analyzes the emerging scaling behaviors defining the next generation of AI.
Speakers
Jakub Strawa
AI Researcher and Research Engineer specializing in LLM training, post-training, and multimodal models, bridging the gap between cutting-edge research and scalable engineering. Currently at Stonly, I focus on developing, training, and rigorously evaluating AI agents. My background includes building enterprise-grade applications for Fortune 500 companies, conducting R&D at Roche and Raiffeisen Bank, and working on multimodal reasoning at TCL, where I collaborated directly with top-tier researchers in China and the Qwen team.
Dharm
Partners
ActiveCampaign
Marketing industry has spent decades asking humans to think like machines - configure this trigger, map that workflow, repeat. At ActiveCampaign we are building the opposite: a platform where AI handles the orchestration, and our customers can focus on outcomes. Active Intelligence, our production AI layer, is what makes that real - autonomous campaign execution across email, SMS, and CRM, at the scale of tens of thousands of businesses. Our Cracow Hub is where nine engineering teams turn that ambition into working software - and where agentic workflows, RAG, MCP, and AI-assisted development are not tools we experiment with, but the way we build.
CEE AI Hub