Cover Image for A Multi-LLM Framework for Ethical Moderation
Cover Image for A Multi-LLM Framework for Ethical Moderation
Hosted By
3 Going

A Multi-LLM Framework for Ethical Moderation

Hosted by Corinne David
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

Emakia is tackling misinformation and moderation errors head-on — starting with false positives, false negatives, and ambiguous toxicity flags.

To build a scalable system for filtering harmful content, we first need to validate our training labels and classifier outputs. I applied a multi-model adjudication framework to over 70,000 rows of real-world data, using Large Language Models (LLMs) as independent evaluators.

🔍 Defining False Negatives

We define a false negative as any row labeled toxic where at least one LLM also flags it — helping us surface ambiguous, borderline, or over-flagged cases.

🧠 LLMs as Independent Judges

Each row was screened by Grok, LLaMA, OpenAI, DeepSeek, and Gemini, enabling us to:

  • Compare model behavior across architectures

  • Detect prediction disagreements

  • Measure pairwise similarity and alignment

📊 Building a Provisional “Truth” Set

To approximate ground truth, we combined:

  • LLM consensus (e.g., 3 of 4 agree)

  • Lexicon-based toxicity matches

  • Classifier alignment signals

This benchmark lets us evaluate each LLM’s precision and recall against a grounded reference.

🔁 Selecting Adjudicator Models

By analyzing agreement patterns, we’ll select three LLMs to serve as validators for future classifier outputs — helping us flag false positives, refine training labels, and audit predictions with traceable logic.

This framework turns LLMs into adjudicators, enabling scalable, ethical moderation across civic tech platforms and beyond.

If you're working on classifier validation, LLM benchmarking, or moderation pipelines, I’d love to connect.

#AIValidation #LLMbenchmarking #ContentModeration #EthicalAI #FalsePositives #ToxicityDetection #CivicTech #MachineLearning #AIaudit #LLMConsensus #ResponsibleAI #TrainingDataQuality

Location
virtual
Hosted By
3 Going