

The One About Synthetic Data & Evals
As AI systems grow in scale and complexity, understanding how to generate better data and evaluate models effectively becomes critical. Join us to explore how systematic evaluation from data synthesis to model reasoning can enable more transparent, reliable, and adaptive AI development.
More About the Sharings
Anshu (AI & Data Privacy Research Engineer, GovTech) will present insights from GovTech’s Data Practice team’s new Synthetic Data Primer, explaining what synthetic data is and why it matters, how it’s created (from classical methods to LLM-driven approaches), and how to evaluate its quality and privacy. The talk highlights practical workflows, multi-dimensional evaluation, and layered privacy-risk assessment, with real-world use cases, challenges, and emerging trends shaping responsible adoption across agencies and industry. (Technical Level: 100 -200)
Tham Zheng Kang (Data Scientist, CareerKaki) will be sharing on LLM Model Benchmarking for CareerKaki, a career chatbot that aims to provide an easy, personalized, and trustworthy way to think about careers. Through careful metric choices and deliberate methodology planning, discover how evaluation goes beyond an academic exercise, tying the technical elements closely into the product objectives in order to provide tailored insights into what works for the product. (Technical Level: 200)
Saran Subramanian (Founder, Aqumen.ai) will discuss dynamic benchmarks and their importance in evaluating model performance. The talk will trace how creative reward design has enabled assessment of qualitative outputs such as summaries and touch upon how pedagogical principles can inform evaluation design. Future directions for adaptive evaluation in the evolving AI landscape will be woven throughout. (Technical Level: 200)
Struggling to build stakeholder trust when testing LLM application safety? The AI Verify Foundation product team will share how they're revamping Project Moonshot to address key challenges: dataset selection, LLM-as-judge reliability, and variance in test results. Get an early preview of the next version and learn how it helps business owners and data scientists meet IMDA's benchmark testing recommendations while building high-trust evaluation processes. (Technical Level: 100)
More About the Speakers
Anshu is an AI and Data Privacy Research Engineer at Government Technology Agency (GovTech), Singapore. Before joining GovTech, her research focused on the intersection of computer vision and privacy at the NUS Centre for Research in Privacy Technologies. She holds a Master’s degree in AI from the National University of Singapore and enjoys building practical, user-centric privacy solutions for the public sector by putting research into practice.
Tham Zheng Kang is a product Data Scientist for CareerKaki. He implements the LLM related features and performs the evaluation and analytics functions for the team. Previously, he led data science projects at MOE, from engaging business divisions to scoping and executing them, with the aim of enhancing student outcomes.
Saran Subramanian is the founder of Aqumen.ai, an agentic AI-powered assessment platform that measures conceptual mastery by having users detect errors in AI-generated code or reasoning. He previously led the AI team at a funded gifting and e-commerce startup, an early B2C adopter of LLMs. Saran engages deeply with research and safety as an ICLR 2025 attendee and participant in the inaugural Singapore satellite of the ARENA program (RECAP). He studied mathematics at the University of Chicago, developed machine learning skills through MIT's MicroMasters Program in Statistics and Data Science, and holds a Master's in Data Science from the University of Colorado, Boulder.