

AI Manipulation Hackathon
This is the Montréal edition of the AI Manipulation Hackathon.
Friday evening to Sunday midday.
RSVP here and on the official page to be admissible.
Location determined a few days before. Bring your own food / collective delivery.
---
The line between authentic interaction and strategic manipulation is disappearing as AI systems master deception, sycophancy, sandbagging, and psychological exploitation at scale. Our ability to detect, measure, and counter these behaviors is dangerously underdeveloped.
This hackathon brings together 500+ builders to prototype systems that could help us measure, detect, and defend against AI manipulation. You'll have one intensive weekend to build something real – tools that could actually help us understand and mitigate one of AI safety's most pressing challenges.
Top teams get:
💰 $2,000 in cash prizes
The chance to continue development through Apart Research's Fellowship program
Guaranteed* acceptance to present at the IASEAI workshop in Paris on the 26th of February 2026 https://www.iaseai.org/our-programs/iaseai26
Apply if you believe we need better tools to understand and defend against AI manipulation before it scales beyond our ability to control.
In this hackathon, you can build:
Manipulation benchmarks that measure persuasive capabilities, deception, and strategic behavior with real ecological validity
Detection systems that identify sycophancy, reward hacking, sandbagging, and dark patterns in deployed AI systems
Real-world monitoring tools that analyze actual deployment data to catch manipulation in the wild
Evidence-based mitigations – MVPs demonstrating novel countermeasures with empirical backing
Multi-agent simulations exploring emergent manipulation dynamics and training processes that produce deceptive behavior
Pursue other empirical projects that advance our understanding of how AI systems manipulate and how we can stop them
You'll work in teams over one weekend and submit open-source benchmarks, detection tools, data analyses, mitigation prototypes, or empirical research that advances our ability to understand and counter AI manipulation.
What is AI manipulation?
AI manipulation refers to AI systems using deception, strategic behavior, or psychological exploitation to achieve their goals at the expense of human values and intentions. This includes:
Sycophancy means telling users what they want to hear instead of what's true
Strategic deception is misleading humans about capabilities or intentions
Sandbagging hides true capabilities during evaluation to avoid restrictions or oversight
Reward hacking exploits unintended loopholes in ways that violate the spirit of the objective
Dark patterns manipulate user decisions through interface design
Persuasive manipulation deploys influence techniques that bypass rational decision-making
An AI system pursuing basically any goal might figure out that deceiving humans or exploiting our psychological weaknesses is just... effective. The way we're training these systems might be teaching them to do exactly that.
What makes this dangerous: we're bad at measuring it. Our benchmarks miss strategic behavior. We lack real-world monitoring systems. AI capabilities are advancing faster than our ability to evaluate them honestly.
Why this hackathon?
The Problem
The gap is widening. AI systems get more capable, our detection tools don't. Models game engagement metrics because it works. Agents discover shortcuts through reward functions we never anticipated. Put multiple systems together and watch manipulation emerge in ways nobody predicted.
This is already happening. Models sandbag evaluations to avoid safety checks. We discover reward hacking only after deployment. Real-world systems manipulate users at scale through dark patterns. Our measurement tools? Completely inadequate.
Most evaluations are toy benchmarks built before we realized how strategic AI systems could be. They miss the manipulation that only shows up in real deployments. We're flying blind.
Why AI Manipulation Defense Matters Now
Safety depends on honest evaluation. If AI systems can deceive evaluators or hide dangerous capabilities, our safety work becomes meaningless. We can't align what we can't measure honestly.
We're massively under-investing in manipulation measurement and defense. Most effort goes into scaling capabilities or reactive harm mitigation. Far less into building the benchmarks and detection systems that catch manipulation before it causes damage.
Better measurement technology could give us evaluations that systems can't game, help us detect manipulation before it scales, and restore some balance between AI's ability to manipulate and our ability to detect it. It could create the transparency and empirical foundation we need to ground safety research in reality.
Hackathon Tracks
1. Measurement & Evaluation
Design benchmarks and evaluations for sycophancy, reward hacking, dark design patterns, and persuasive capabilities in AI systems
Assess ecological validity of current measurement approaches and identify gaps between lab evaluations and real-world deployment
Create detection methods for deception, sandbagging, and strategic behavior in AI systems
Build frameworks for detecting and attributing manipulative intent in model outputs
2. Real-World Analysis
Analyze actual deployment data (chat logs, social media interactions, customer service transcripts) and conduct case studies of manipulation incidents
Build monitoring systems to detect manipulation in the wild across different deployment contexts
Compare benchmark predictions to real-world behavior and identify discrepancies or performance gaps
Develop methods for systematic data collection and analysis of manipulation patterns at scale
3. Mitigations:
Build MVPs demonstrating novel countermeasures or technical mitigations that can be integrated into existing AI systems
Develop transparency interventions with empirical backing showing reduced manipulation
Create governance proposals grounded in data from real-world analysis or evaluations
Prototype user-facing tools that help detect or resist AI manipulation attempts
4. Open Track
Explore emergent manipulation through multi-agent dynamics or training dynamics that lead to manipulative behavior
Analyze dual-use considerations in manipulation research and mitigation
Develop novel theoretical frameworks for understanding AI manipulation
Pursue other empirical projects advancing the field that don't fit the tracks above
Who should participate?
This hackathon is for people who want to build solutions to technological risk using technology itself.
You should participate if:
You're an engineer or developer who wants to work on consequential problems
You're a researcher ready to validate ideas through practical implementation
You're interested in understanding how AI systems deceive, manipulate, or game evaluations
You want to build practical measurement, detection, or mitigation tools
You're concerned about AI systems optimizing for engagement over truth
No prior manipulation research experience required. We provide resources, mentors, and starter templates. What matters most: curiosity about the problem and willingness to build something real over an intensive weekend.
Fresh perspectives combined with solid technical capabilities often yield the most novel approaches.
What you will do
Participants will:
Form teams or join existing groups.
Develop projects over an intensive hackathon weekend.
Submit open-source benchmarks, detection tools, scenario analyses, monitoring tools, or empirical research advancing our understanding of AI trajectories
Please note: Due to the high volume of submissions, we cannot guarantee written feedback for every participant, although all projects will be evaluated.
What happens next
Winning and promising projects will be:
Awarded with $2,000 worth of prizes in cash.
Guaranteed acceptance to present at the IASEAI workshop in Paris on the 26th of February 2026
Published openly for the community.
Invited to continue development within the Apart Fellowship.
Shared with relevant safety researchers.
Why join?
Impact: Your work may directly inform AI governance decisions and help society prepare for transformative AI
Mentorship: Expert AI safety researchers, AI researchers, and policy practitioners will guide projects throughout the hackathon
Community: Collaborate with peers from across the globe working to understand AI's trajectory and implications
Visibility: Top projects will be featured on Apart Research's platforms and connected to follow-up opportunities