

Beyond Scaling: Making Large Language Models Efficient
Transformer architectures have enabled the rapid progress of modern language models, but scaling them efficiently remains a major challenge. As models grow larger, compute cost, memory usage, inference latency, and deployment complexity increase significantly.
In this session, Abhay will discuss practical research directions to improve transformer efficiency in small and medium-sized language models.
The talk will cover architectural experiments, training optimisations, efficient attention mechanisms, memory-efficient techniques, and inference-focused design choices being explored while building open-source LLMs at FrontiersMind.
Speaker:
Abhay Kumar is a co-founder of FrontiersMind, an AI research lab focused on efficient small and medium-sized language models optimised for enterprise and real-world deployment use cases.
Hugging Face: - https://huggingface.co/FrontiersMind
LinkedIn: https://www.linkedin.com/in/akanyaani/
To attend online:
Add to calendar: https://shorturl.at/rGZU1
Gmeet link: meet.google.com/uzb-mmdh-wjs
Pre-read:
Basics of Transformer architectures
Looking forward to seeing you!