Cover Image for London Systems Club
Cover Image for London Systems Club
32 Going

London Systems Club

Hosted by Botir Khaltaev & 5 others
Register to See Address
London, England
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

London Systems Club is a technical meetup for engineers who care about systems, hardware, and performance. This is not a general tech networking event.

The focus is on low-level and performance-critical engineering: kernels, compilers, storage engines, networking, operating systems, GPUs, and high-performance infrastructure. Topics include latency, throughput, memory bandwidth, cache behaviour, and real production failure modes.

The format is short talks with substantial discussion after each. No sales pitches, no recruitment, and no beginner content.

Schedule

6:00-6:15 - Arrival and intro

6:15-6:35 - Luke Ramsden (CPTO, Architect)

X: https://x.com/lukerramsden

High-performance systems engineering in a garbage-collected language. Real constraints and performance trade-offs from production event-driven systems.

6:35-7:05 - Discussion

7:05-7:25 - Nikita Lapkov (Senior Engineer, Cloudflare)

Linkedln: https://www.linkedin.com/in/nikitalapkov/

Adaptive Distributed Query Execution. How modern query engines scale analytical workloads, and what breaks in production.

7:25-7:55 - Discussion

7:55-8:15 - Fergus Finn, PhD (CTO, Doubleword)

Linkedln: https://www.linkedin.com/in/fergusfinn/

How fast can an LLM go? A systems-level look at inference performance, from compute vs bandwidth to prefill vs decode.

8:15-9:00 - Discussion

Pre-reading

For Luke’s talk (required):

TransFICC Thought Leadership Talks - Martin Thompson

https://youtu.be/-Fd-JOEI1Nk

Mythbusting Modern Hardware to Gain 'Mechanical Sympathy' • Martin Thompson

https://youtu.be/MC1EKLQ2Wmg

For Nikita’s talk (required):

For Fergus’s talk


Roofline Model (short + essential)
https://en.wikipedia.org/wiki/Roofline_model
→ This is the mental model the blog uses implicitly: compute-bound vs memory-bound, arithmetic intensity, bandwidth ceilings.

How LLM Inference Works (KV cache + decoding cost) – Arpit Bhayani
https://arpitbhayani.me/blogs/how-llm-inference-works
→ Explains exactly where the FLOPs and memory traffic come from during prefill vs decode, which the inference arithmetic builds on.

Location
Please register to see the exact location of this event.
London, England
32 Going