LLMs and Alignment. Research Talks: Dr. Shantipriya Parida
The Research Talks by LLMs and Alignment continue. At our third event, Dr. Shantipriya Parida will present his research: LLMs for All: Building Inclusive Models for Low-Resource Languages
In a world of approximately 7,000 languages, the benefits of Large Language Models (LLMs) remain concentrated on a small subset, leaving most low- and ultra-low-resource languages underserved. Even for many of these languages, LLM performance is constrained by data scarcity, limited tooling, and insufficient technological support. This talk addresses a central question: how can we design AI systems that truly serve all languages? The talk addresses the challenges and end-to-end process of building Large Language Models (LLMs) for low- and ultra-low-resource languages. We explore this problem through two representative case studies: Odia (an Indic language) as a low-resource language and Sámi (an Uralic language) as an ultra-low-resource language. Drawing from these examples, we discuss practical approaches such as multilingual pretraining, transfer learning, tokenization strategies, and community-driven data development. The talk highlights key technical and practical challenges, design decisions, and lessons learned in building inclusive language technologies. We conclude with a vision for scalable and equitable AI that extends the benefits of LLMs to all language communities.
