Cover Image for Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Cover Image for Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Avatar for Trajectory Labs
Presented by
Trajectory Labs
Hosted By
36 Went

Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

Get Tickets
Past Event
Welcome! Please choose your desired ticket type:
About Event

Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks.

Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions

​​​​If you can't attend in person, join our live stream starting at 6:30 pm via this link.

​​​​This is part of our weekly AI Safety Thursdays series. Join us in examining questions like: 

  • ​​​​How do we ensure AI systems are aligned with human interests? 

  • ​​​​How do we measure and mitigate potential risks from advanced AI systems? 

  • ​​​​What does safer AI development look like?

Location
30 Adelaide St E
Toronto, ON M5C 3G8, Canada
Enter the main lobby of the building and let the security staff know you are here for the AI event. You may need to show your RSVP on your phone. You will be directed to the 12th floor where the meetup is held. If you have trouble getting in, give Georgia a call at 519-981-0360.
Avatar for Trajectory Labs
Presented by
Trajectory Labs
Hosted By
36 Went