

The One About LLM Safety (ft. OCBC AI Lab)
As LLMs take on increasingly complex evaluation and retrieval tasks, safety mechanisms are essential to ensure reliability, alignment, and robustness. Join us to explore applied approaches to LLM safety, featuring tools for assessing judge trustworthiness and detecting unsafe or incorrect model outputs in enterprise RAG workflows.
More About the Sharings
Leanne (Data Scientist, GovTech) will be sharing MetaEvaluator, an open-source tool that helps you figure out which LLM judges you can trust. Explore how to manage your LLM judges, collect human annotations, and measure judge-human alignment using comprehensive agreement metrics. The tool can help you scale your evaluation workflow and make data-driven decisions about which judge is most aligned with humans. (Technical Level: 200)
As generative AI becomes more adopted as a tool to uplift employee productivity, enterprises would require a high degree of accuracy when generating answers from referenced documents for a variety of tasks such as summarization or Q&A. Joven (Data Scientist, OCBC) will share more on Ragulator V3, an in-house lightweight tool that OCBC has developed to flag out of context LLM responses for RAG use cases. This project is currently presented in the 2025 EMNLP conference. (Technical Level: 200)
Advances in generative AI now make it possible to create highly realistic images, videos and even audio files, that can be leveraged for impersonations that pose reputational and financial risks to both businesses and individuals. Seok Min (Asst Director, IMDA) will feature the technical experiments conducted by IMDA’s BizTech Group, including a live demo on how AI-generated impersonations are created, highlighting their potential harms, and emphasising the importance of safeguarding against these risks. (Technical Level: 100)
More About the Speakers
Leanne is a Data Scientist at GovTech Singapore, working on Responsible AI where she builds capabilities in AI safety and evaluations. Her work focuses on developing tools to assess and safeguard LLMs and their applications. Previously, she worked on continual pre-training and fine-tuning of LLMs for specialised use cases.
Joven Heng is a Data Scientist in OCBC's Risk and Customer Experience team, where he designs and deploys AI-driven solutions to mitigate risk and enhance customer journeys. With a bachelor's degree in business and a second major in computer science, he develops applications ranging from sentiment analysis of news articles for early risk signals to LLM-powered classification of customer feedback. His experience building agentic AI workflows and RAG models, particularly challenges with hallucination detection, sparked his interest in developing guardrails to improve the reliability of generative AI systems.
Seok Min leads the technical research team in Infocomm Media Development (IMDA), driving initiatives in trust-related technologies, such as AI testing and digital watermarking. She began her career in cybersecurity research and has more recently focused on AI governance testing (AI Verify) and the evaluation of generative models (Project Moonshot).
Psst....Interested in becoming a speaker for our sessions? Sign up here!