London Systems Club
London Systems Club is a technical meetup for engineers who care about systems, hardware, and performance. This is not a general tech networking event.
The focus is on low-level and performance-critical engineering: kernels, compilers, storage engines, networking, operating systems, GPUs, and high-performance infrastructure. Topics include latency, throughput, memory bandwidth, cache behaviour, and real production failure modes.
The format is short talks with substantial discussion after each. No sales pitches, no recruitment, and no beginner content.
Schedule
6:00-6:15 - Arrival and intro
6:15-6:35 - Luke Ramsden (CPTO, Architect)
X: https://x.com/lukerramsden
High-performance systems engineering in a garbage-collected language. Real constraints and performance trade-offs from production event-driven systems.
6:35-7:05 - Discussion
7:05-7:25 - Nikita Lapkov (Senior Engineer, Cloudflare)
Linkedln: https://www.linkedin.com/in/nikitalapkov/
Adaptive Distributed Query Execution. How modern query engines scale analytical workloads, and what breaks in production.
7:25-7:55 - Discussion
7:55-8:15 - Fergus Finn, PhD (CTO, Doubleword)
Linkedln: https://www.linkedin.com/in/fergusfinn/
How fast can an LLM go? A systems-level look at inference performance, from compute vs bandwidth to prefill vs decode.
8:15-9:00 - Discussion
Pre-reading
For Luke’s talk (required):
TransFICC Thought Leadership Talks - Martin Thompson
https://youtu.be/-Fd-JOEI1Nk
Mythbusting Modern Hardware to Gain 'Mechanical Sympathy' • Martin Thompson
For Nikita’s talk (required):
Introduction to query execution: https://youtu.be/E-UUd6cB57w
Optional but recommended: https://15721.courses.cs.cmu.edu/spring2024/papers/08-scheduling/p743-leis.pdf
Optional: https://15721.courses.cs.cmu.edu/spring2024/papers/18-databricks/sigmod_photon.pdf
Paper for discussion: https://www.vldb.org/pvldb/vol17/p3947-bu.pdf
For Fergus’s talk
Roofline Model (short + essential)
https://en.wikipedia.org/wiki/Roofline_model
→ This is the mental model the blog uses implicitly: compute-bound vs memory-bound, arithmetic intensity, bandwidth ceilings.
How LLM Inference Works (KV cache + decoding cost) – Arpit Bhayani
https://arpitbhayani.me/blogs/how-llm-inference-works
→ Explains exactly where the FLOPs and memory traffic come from during prefill vs decode, which the inference arithmetic builds on.
