Cover Image for The One About Synthetic Data & Evals
Cover Image for The One About Synthetic Data & Evals
Avatar for Lorong AI
Presented by
Lorong AI
Hosted By

The One About Synthetic Data & Evals

Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

As AI systems grow in scale and complexity, understanding how to generate better data and evaluate models effectively becomes critical. Join us to explore how systematic evaluation from data synthesis to model reasoning can enable more transparent, reliable, and adaptive AI development.

More About the Sharings

  • Anshu (AI & Data Privacy Research Engineer, GovTech) will present insights from GovTech’s Data Practice team’s new Synthetic Data Primer, explaining what synthetic data is and why it matters, how it’s created (from classical methods to LLM-driven approaches), and how to evaluate its quality and privacy. The talk highlights practical workflows, multi-dimensional evaluation, and layered privacy-risk assessment, with real-world use cases, challenges, and emerging trends shaping responsible adoption across agencies and industry. (Technical Level: 100 -200)

  • Saran Subramanian (Founder, Aqumen.ai) will explore how dynamic benchmarks can better measure conceptual mastery in AI. Building on adversarial methods, he’ll show how Aqumen.ai creates domain-agnostic assessments that detect and analyze AI output errors, aligning technical design with pedagogical principles like the agentic scaffold. Drawing on lessons from existing benchmarks, Saran will then introduce a taxonomy of limitations that evolves with the framework and, through the MCP, supports prompt engineering and AI safety. Also, get the chance to see the framework in action, as a reinforcement learning environment for language, multimodal, and computer-use agents. (Technical Level: 200)

  • Model Benchmarking & Evals (More information to come soon - keep a lookout! 👀)

  • Want to test the safety of your LLM applications, but find it hard to build trust with your stakeholders? As testing and eval get democratised, trust is still hard to build today. Why choose a particular dataset? How reliable is your LLM-as-judge? Hear more from the product team at AI Verify Foundation who's revamping Project Moonshot to help business/product owners and data scientists in the private sector meet these challenges, by building high-trust benchmark testing recommended in IMDA’s Starter Kit. Have an early peek into the next version of the product, and learn about how they uplift the trustworthiness in test results by tackling the variance and bias problems associated with LLM-as-judge. (Technical Level: 100)

More About the Speakers

  • Anshu is an AI and Data Privacy Research Engineer at Government Technology Agency (GovTech), Singapore. Before joining GovTech, her research focused on the intersection of computer vision and privacy at the NUS Centre for Research in Privacy Technologies. She holds a Master’s degree in AI from the National University of Singapore and enjoys building practical, user-centric privacy solutions for the public sector by putting research into practice.

  • Saran Subramanian is the founder of Aqumen.ai, an agentic AI-powered assessment platform that measures conceptual mastery by having users detect errors in AI-generated code or reasoning. He previously led the AI team at a funded gifting and e-commerce startup, an early B2C adopter of LLMs. Saran engages deeply with research and safety as an ICLR 2025 attendee and participant in the inaugural Singapore satellite of the ARENA program (RECAP). He studied mathematics at the University of Chicago, developed machine learning skills through MIT's MicroMasters Program in Statistics and Data Science, and holds a Master's in Data Science from the University of Colorado, Boulder.

Psst....Interested in becoming a speaker for this session? Sign up here!

Location
Lorong AI (WeWork@22 Cross St.)
Avatar for Lorong AI
Presented by
Lorong AI
Hosted By