

MunichNLPxTNG: Model Merging in LLMs & Memory in LLMs
Event Description:
Explore the latest in Natural Language Processing and Large Language Models at the Munich NLP in-person meetup. Join us on February 10, 2026 at TNG Technology Consulting. Doors open at 18:00 and the program starts at 18:30. We’ll feature two expert talks on cutting-edge LLM applications, followed by pizza and networking. All are welcome!
Agenda
18:00 - Open Door
18:30 - Welcome + Intro to MunichNLP and TNG Technology Consulting GmbH
18:40 - AI Research @ TNG: How to process 20 billion tokens per day on OpenRouter - Daniel Klingmann & Fabian Klemm & Henrik Klagges
19:20 - Break (5 min)
19:25 - Benchmarking Memory in LLMs: Retrieval, Long Context, and Multi-Turn Interaction - Ali Modarressi
20:05 - Pizza + Networking
21:30 - Close
Talk abstracts (2 × 30 min + Q&A)
AI Research @ TNG: How to process 20 billion tokens per day on OpenRouter - Henrik Klagges & Fabian Klemm & Daniel Klingmann
TNG combined efficient use of limited GPU resources with innovative approaches to construct high-performance child LLMs based on DeepSeek parent models. The talk outlines some of the approaches that worked, and some that did not, and how the new model variants differ. The practical relevance of the variants has been demonstrated empirically by the over 20 billion tokens processed by these models every day, sometimes reaching 105k requests per hour. The Chimera models, for example, resulted in TNG becoming one of Open Router's Top 10 open-source model creators.
Benchmarking Memory in LLMs: Retrieval, Long Context, and Multi-Turn Interaction - Ali Modarressi
As LLM systems rely increasingly on retrieval, long contexts, and extended interaction, benchmarking how reliably they can access and use information becomes essential. I will first discuss controlled evidence showing that dense retrievers can be biased toward heuristic cues—favoring shorter documents, earlier mentions, repeated entities, or literal matches—sometimes outranking answer-containing evidence. I will then introduce implicit fact retrieval settings where relevance depends on facts stated only implicitly in documents, requiring temporal, arithmetic, or world-knowledge inference despite simple queries. Next, I will cover long-context evaluation beyond literal matching, showing that performance degrades substantially as context grows once lexical overlap cues are removed. Finally, I will discuss dialogue-conditioned settings for extended interactions that quantify drift and trade-offs in persona consistency, instruction following, and safety behavior over long conversations. I will conclude by briefly highlighting how these benchmarks can inform the design choices we make when building memory-augmented LLM systems.
Bio Ali Modarressi
Ali Modarressi is a third-year PhD student at the Center for Information and Language Processing (CIS) at LMU Munich, supervised by Prof. Hinrich Schütze. Their current research focuses on memory-augmented large language models and, more broadly, long-context language modeling. They have also worked on interactive language generation and information extraction. Ali began their NLP research during their MSc under the supervision of Mohammad Taher Pilehvar, where they studied explainability methods and the interpretability of pre-trained language models—topics that remain relevant to their current work, particularly in analyzing retrieval models and knowledge probing.
Sponsors:
Thank you to TNG Technology Consulting GmbH for sponsoring and supporting the organization of this event.
Organizer:
Munich NLP