Cover Image for AI Safety Poland Talks #10
Cover Image for AI Safety Poland Talks #10
Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
24 Went

AI Safety Poland Talks #10

Google Meet
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Welcome to AI Safety Poland Talks!

​A biweekly series where researchers, professionals, and enthusiasts from Poland or connected to the Polish AI community share their work on AI Safety.

💁 Topic: Multi-layer Prototypes for Efficient Safety Moderation
📣 Speaker: Maciej Chrabąszcz
🇬🇧 Language: English
🗓️ Date: 19.03.2026, 18:00
📍 Location: Online

Speaker Bio
Maciej Chrabąszcz is an AI Safety Researcher at NASK National Research Institute and a PhD student at Warsaw University of technology. In his research he focuses on utilization of inner representations of the models for detecting and mitigating unsafe behaviors.

Abstract
Although modern LLMs are aligned with human values during post-training, robust moderation remains essential to prevent harmful outputs at deployment time. Existing approaches suffer from performance-efficiency trade-offs and are difficult to customize to user-specific requirements. Motivated by this gap, we introduce Multi-Layer Prototype Moderator (MLPM), a lightweight and highly customizable input moderation tool. We propose leveraging prototypes of intermediate representations across multiple layers to improve moderation quality while maintaining high efficiency. By design, our method adds negligible overhead to the generation pipeline and can be seamlessly applied to any model. MLPM achieves state-of-the-art performance on diverse moderation benchmarks and demonstrates strong scalability across model families of various sizes. Moreover, we show that it integrates smoothly into end-to-end moderation pipelines and further improves response safety when combined with output moderation techniques. Overall, our work provides a practical and adaptable solution for safe, robust, and efficient LLM deployment.

Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
24 Went