

Join us on our #london channel in the MLOps Community Slack: https://mlops.community/join/
The Hidden Layers of AI Systems: Memory, Retrieval, and Search
The Hidden Layers of AI Systems: Memory, Retrieval, and Search
Most AI systems don’t fail because of the model alone. They fail because of memory, retrieval, context, and evaluation.
Agents accumulate too much context and lose signal. RAG pipelines degrade when retrieval quality is poor. Search systems return results, but not always the right ones. Coding agents struggle to preserve intent across long-running sessions. And evaluating whether an AI system actually found the right information remains one of the hardest problems in production.
This MLOps Community London meetup, sponsored by Qdrant, explores what it takes to build reliable AI systems beyond the model layer.
We’ll hear from engineers working on real-world AI infrastructure, agent memory, retrieval systems, coding agents, and production AI workflows, followed by a speaker panel, audience Q&A, and time to connect with other builders in London.
Topics We’ll Explore
Agentic Memory Architecture
Why memory matters in production AI systems
Avoiding context bloat in long-running agents
When agents should retrieve, remember, or forget
Practical patterns for persistent memory systems
Retrieval and Context Quality
Why naive retrieval leads to brittle AI systems
How to preserve useful context across tasks and sessions
Vector search, hybrid retrieval, and structured specifications
Designing memory layers that scale beyond a single prompt
Evaluation and Reliability
How to detect drift, fake completion, and degraded context
Why “the agent finished” does not always mean the task is done
Evaluating retrieval quality beyond surface-level metrics
Designing AI systems that are easier to inspect, debug, and trust
AI-Native Engineering Workflows
How coding agents use context, tools, and memory
Building harnesses for long-running agent tasks
Declarative context architecture vs retrieval-heavy workflows
What reliable autonomous engineering systems might look like
Agenda
5:30 – 6:15 PM | Doors Open: Arrival & Networking
Check in, grab food and drinks, and meet other engineers, founders, and builders working on AI systems.
6:15 – 6:25 PM | Welcome & Kickoff
Opening remarks from MLOps Community London, including a special community announcement.
6:25 – 6:50 PM | Grow the Data, Shrink the Bill
Agent efficiency through vector search.
AI agents waste tokens in four common ways: context accumulation, payload inefficiency, generative waste, and failure recovery. Stuffing the context window with everything can work, but it is often 50 to 100x more expensive than targeted retrieval.
This talk explores how vector embeddings can increase information density so an orchestrator works only with retrieved context from a vector store. Ewa will share a benchmark experiment across SWE-Bench, LongMemEval, and FinanceBench, comparing agentic grep, Qdrant BM25 sparse retrieval, and Qdrant dense retrieval, with tokens-to-resolution as the primary metric.
🎤 Ewa Szyszka, DevRel Engineer at Qdrant
Ewa Szyszka is a DevRel Engineer at Qdrant, focused on technical education, developer community, and applied AI systems. She has worked across GenAI evaluations, multi-agent systems, AI gateways, observability, and RAG, with previous DevRel roles at orq.ai and Kilo Code. She has also organised large technical meetups and built tools across cloud, data, and AI workflows.
6:50 – 7:10 PM | Building Your Own Memory Layer for Coding Agents
Coding agents struggle with long-running context, repository-scale retrieval, and maintaining architectural intent across sessions.
This talk explores practical patterns for designing memory systems using vector search, hybrid retrieval, and structured specifications. Shashi will share lessons from building a spec-driven external memory architecture with Qdrant and SpecMem, along with practical guidance for developers building scalable memory layers for their own AI coding agents.
🎤 Shashi Jagtap, Founder at Superagentic AI
Shashi Jagtap is the founder of Superagentic AI, where he builds developer tools and frameworks for production-grade agentic AI systems. He previously spent nearly six years at Apple working on Xcode and developer tooling, and has close to 20 years of experience across DevEx, automation, mobile infrastructure, and agent engineering. He also organises the London Agentic AI community.
7:10 – 7:20 PM | Break 🍕🍺
Refuel, connect, and continue the conversation.
7:20 – 7:40 PM | Context, Environment, Knowledge: Three Layers on the Road to Autonomous Engineering
Model evolution is a catalyst, not the cure.
This talk covers memory patterns that actually work for long-running, multi-agent tasks: how to avoid drift, how to catch “fake done”, and how context, environment, and knowledge become building blocks on the road to a software factory.
🎤 Dmytro Yaroshenko, Applied AI Engineer at Factory AI
Dmytro Yaroshenko is an Applied AI Engineer at Factory AI, working on practical AI systems for autonomous software engineering. He previously worked at Contextual AI, Snowflake, and Deutsche Bank, with experience spanning applied AI, data engineering, solution architecture, and strategic enterprise AI adoption. His work focuses on moving AI systems from experimentation into reliable production workflows.
7:40 – 8:00 PM | Stop Retrieving. Start Declaring.
A codebase is already a memory system. Every file, folder structure, and architectural decision is context that agents consistently fail to read.
This talk makes the case that retrieval is often a workaround for missing structure, and introduces the declarative context architecture pattern: scoped context files, task blueprints, and a stateful memory harness that carries state across sessions and agent boundaries.
You’ll leave with a clear framework for when to declare versus when to retrieve, and a harness architecture you can build against immediately.
🎤 Talha Sheikh, AI Engineer at Checkout.com
Talha Sheikh is an AI Software Engineer at Checkout.com, where he works on scalable AI adoption, platform infrastructure, and production AI strategy. He previously worked as a software engineer at TrueLayer and American Express, building financial systems and backend services. Outside work, he is building Vector, a harness designed to help AI agents finish what they start.
8:00 – 8:35 PM | Panel Discussion + Audience Q&A
Join all speakers for an open discussion on memory, retrieval, search, coding agents, and the hidden infrastructure behind reliable AI systems.
Bring your questions.
8:35 – 9:00 PM | Networking
Continue the conversation with speakers, attendees, and other builders working on production AI systems.
9:00 PM onwards | Social
Optional post-event social nearby. Location TBC.
Who Should Attend
AI / ML engineers
Backend and data engineers
Engineers building RAG or agent systems
Engineers working on search, retrieval, memory, or evaluation
Technical founders and product builders
Anyone interested in production AI infrastructure
Please only register if you plan to attend, as capacity is limited.
Sponsored by Qdrant
Qdrant is a high-performance vector database and search engine for building AI applications with advanced retrieval, filtering, and similarity search.
MLOps Community London brings together engineers, researchers, founders, and practitioners building real-world machine learning and AI systems.
Join us on our #london channel in the MLOps Community Slack: https://mlops.community/join/