Cover Image for Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

Presented by

Trajectory Labs

Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.

Join us for a variety of events on technical AI safety, governance in a world of advanced AI, and more.

Hosted By

36 Went

AI

Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

Name: Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Start: 2026-04-02T18:00:00.000-04:00
End: 2026-04-02T21:00:00.000-04:00
Location: 30 Adelaide St E

Trajectory Labs

30 Adelaide St E

Toronto, Ontario

Past Event

Welcome! Please choose your desired ticket type:

You will be asked to verify token ownership with your wallet.

About Event

Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks.

Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions

If you can't attend in person, join our live stream starting at 6:30 pm via this link.

This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:

How do we ensure AI systems are aligned with human interests?
How do we measure and mitigate potential risks from advanced AI systems?
What does safer AI development look like?

Location

30 Adelaide St E

Toronto, ON M5C 3G8, Canada

Enter the main lobby of the building and let the security staff know you are here for the AI event. You may need to show your RSVP on your phone. You will be directed to the 12th floor where the meetup is held. If you have trouble getting in, give Georgia a call at 519-981-0360.

Presented by

Trajectory Labs

Catalyzing Toronto's role in steering AI progress toward a future of human flourishing.

Join us for a variety of events on technical AI safety, governance in a world of advanced AI, and more.

Hosted By

36 Went

AI