A Multi-LLM Framework for Ethical Moderation

Name: A Multi-LLM Framework for Ethical Moderation
Start: 2025-10-08T09:00:00.000-07:00
End: 2025-10-08T10:00:00.000-07:00
Location: virtual

Hosted by Corinne David

virtual

Approval Required

Your registration is subject to approval by the host.

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Emakia is tackling misinformation and moderation errors head-on — starting with false positives, false negatives, and ambiguous toxicity flags.

To build a scalable system for filtering harmful content, we first need to validate our training labels and classifier outputs. I applied a multi-model adjudication framework to over 70,000 rows of real-world data, using Large Language Models (LLMs) as independent evaluators.

🔍 Defining False Negatives

We define a false negative as any row labeled toxic where at least one LLM also flags it — helping us surface ambiguous, borderline, or over-flagged cases.

🧠 LLMs as Independent Judges

Each row was screened by Grok, LLaMA, OpenAI, DeepSeek, and Gemini, enabling us to:

Compare model behavior across architectures
Detect prediction disagreements
Measure pairwise similarity and alignment

📊 Building a Provisional “Truth” Set

To approximate ground truth, we combined:

LLM consensus (e.g., 3 of 4 agree)
Lexicon-based toxicity matches
Classifier alignment signals

This benchmark lets us evaluate each LLM’s precision and recall against a grounded reference.

🔁 Selecting Adjudicator Models

By analyzing agreement patterns, we’ll select three LLMs to serve as validators for future classifier outputs — helping us flag false positives, refine training labels, and audit predictions with traceable logic.

This framework turns LLMs into adjudicators, enabling scalable, ethical moderation across civic tech platforms and beyond.

If you're working on classifier validation, LLM benchmarking, or moderation pipelines, I’d love to connect.

#AIValidation #LLMbenchmarking #ContentModeration #EthicalAI #FalsePositives #ToxicityDetection #CivicTech #MachineLearning #AIaudit #LLMConsensus #ResponsibleAI #TrainingDataQuality

Location

virtual

Hosted By

3 Going

AI