

Evaluating and Optimising AI in the Real World
Join the AI Native Dev community in London for a night of practical, technical insight into how modern AI systems actually perform in the real world. From NVIDIA’s work on scaling long-context LLM inference, to Hugging Face’s perspective on Skills and MCP, to Tessl’s lessons from running large-scale evals on AI coding agents, this meetup is all about what breaks, what works, and what developers need to know. Come for the talks, stay for the conversations, and meet others building the next generation of AI-native software.
Agenda
6pm: Doors Open
Talks kick off
6:30: Talk 1: Evaluating AI Skills in the Wild: What We Learned Running Evals at Scale by Rob Willoughby
7pm: Talk 2: Accelerating Long-Context Inference with Skip Softmax Attention by Dom Brown
7:30pm: Talk 3: Using Skills and MCP for Open Source ML by Shaun Smith
8pm: Networking
9pm: THE END
Evaluating AI Skills in the Wild: What We Learned Running Evals at Scale
How do you know if an AI coding agent is actually following your instructions? Or if that skill you wrote is having an impact? What can you learn from synthetic eval cases compared to running them in your project on your code? We'll share practical lessons from building and running large-scale evaluations of coding agents, covering eval design, the ways things break at scale, and what the results reveal about where today's models actually struggle.
Rob Willoughby, Member of Technical Staff, AI Research at Tessl
Rob works on evaluation research at Tessl, designing and running large-scale assessments of how coding agents behave in real-world codebases. His work focuses on figuring out what "good" looks like when an AI agent works with your code from eval design and rubric systems to understanding where models (and infrastructure) break down at scale.
Accelerating Long-Context Inference with Skip Softmax Attention
The growing demand for long-context inference capabilities in Large Language Models (LLMs) has intensified the computational and memory bottlenecks inherent to the standard attention mechanism. To address this challenge, we introduce Skip Softmax Attention, a drop-in sparse attention method that dynamically prunes the attention matrix without any pre-computation or proxy scores. Our method uses a fixed threshold and existing information from online softmax to identify negligible attention scores, skipping softmax computation, value block loading, and the subsequent matrix multiplication. This fits seamlessly into existing FlashAttention kernel designs with negligible latency overhead.
Dominic Brown, Senior AI DevTech Engineer at NVIDIA
Dom Brown is a senior AI developer technology engineer at NVIDIA. His work focuses on optimizing large language model inference. Dom studied computer science at the University of Warwick, UK. In his PhD thesis, he investigated the acceleration of particle-in-cell algorithms on CPU and GPU systems.
Using Skills and MCP for Open Source ML
Session abstract coming soon
Shaun Smith, Open source MCP Lead at HF
Shaun Smith leads Open Source MCP at Hugging Face, and is an MCP Steering Committee member serving as a Community Moderator and within the Transports Working Group.
This event is brought to you as part of the AI Native Dev Community. Consider subscribing to the Mailing List, Podcast, and joining our Discord Community