

Small Cluster, Big Models: Building and Tuning LLMs with Limited GPU Resources
How do you build competitive large language models without hyperscaler-level compute?
In this virtual event, Benjamin and Fabian from TNG Technology Consulting (Germany) share the story behind the TNG Chimera models, a family of LLMs developed using a novel model-merging approach based on DeepSeek architectures.
What started as small-scale experiments inspired by Mixture of Experts has evolved into a broader research effort, scaling from a single GPU node to a distributed setup powered by TNG’s in-house AI infrastructure.
This talk will walk through that journey, from early experimentation to large-scale execution, and unpack the practical realities of building modern models under constrained resources.
Expect insights into:
• Model merging and alternative approaches to scaling LLMs
• Lessons from moving from single-node to distributed systems
• Trade-offs when working with limited compute
• What smaller teams can realistically achieve in today’s AI landscape
We’ll wrap with an open discussion and audience Q&A.
Speakers:
Benjamin Merkel - Principal Consultant at TNG (Germany)
Dr. Benjamin Merkel joined TNG in 2021 after completing his PhD in physics. Today he is Principal Consultant and responsible for running and growing TNG's in-house GPU cluster with more than 100 GPUs. For more than two years he has been serving the biggest and most powerful LLMs to over 900 colleagues, with a particular focus on inference performance optimization.
Fabian Klemm - Senior Software Consultant at TNG (Germany)
Dr. Fabian Klemm completed his doctorate at the Technical University of Munich (TUM) in discrete mathematics and applied geometry. There, he worked on clustering problems under constraints in general geometric spaces, before switching to IT in 2020 and joining TNG as a Software Consultant. Fabian is currently part of the TNG AI research team that published the DeepSeek Chimera models. He also contributes to the TNG Skainet team, which operates TNG’s internal AI server rack.
Format:
30–40 min talk & discussion / Q&A
Audience:
AI engineers, researchers, and founders interested in applied LLM development, infrastructure, and building with constrained compute