Cover Image for Unsupervised Learning 2.0 @ Constellation
Cover Image for Unsupervised Learning 2.0 @ Constellation
Private Event

Unsupervised Learning 2.0 @ Constellation

Hosted by Isaac Dunn, Sasha Berezhnoi & Henry Sleight
Register to See Address
Berkeley, California
Registration
Past Event
Welcome! To join the event, please register below.
About Event

We're hosting Unsupervised Learning again — a day to explore important recent developments in frontier AI safety. (You will be doing the unsupervised learning, not the models!)

For confirmed speakers so far, see the schedule below.

Everything is optional, so drop in and out of sessions as you find useful. We’ll also have plenty of time for 1:1 meetings between people working on similar projects, and space to catch up on work and relax.

This is an invite-only event for researchers from frontier AI companies and some independent research orgs. Know someone interested in coming? They can request to come on Luma.

Register here soon to secure your spot. Late RSVPs are welcome, though we may not be able to accommodate everyone – reach out to [email protected] if you have questions.

Schedule (All Optional)

9:00 — Doors open for breakfast, conversations, and co-working

11:30 Lightning talks on the future of misalignment

Including lightning talks from:

  • Buck Shlegeris, Redwood Research

  • Alex Turner, Google Deepmind

  • Holden Karnofsky, Anthropic

  • Hjalmar Wijk, METR

12:30 — Lunch

1:30 — Optional research discussions

Opt-in short talks and discussion sessions throughout the afternoon — with space for conversations, 1:1s, co-working, and relaxing.

  • Stress testing deliberative alignment for anti-scheming training

    • Jenny Nitishinskaya, OpenAI

  • Q&A on alignment evaluation and misalignment safety cases

    • Sam Bowman, Anthropic

  • Q&A with John Schulman (Thinking Machines)

  • AI resilience

    • Wojciech Zaremba, OpenAI

  • What should the spec be for very capable models?

    • Jason Wolfe, OpenAI

    • Joe Carlsmith, Anthropic

  • Natural misalignment from reward hacking in production RL

    • Evan Hubinger, Anthropic

  • Extrapolating METR's agent time horizon graph

    • Nikola Jurkovic, METR

  • Improving alignment by shaping generalization

    • Sam Marks, Anthropic

  • Preventing reward-compatible misalignment: randomisation games for reinforcement learning

    • Jacob Pfau, UK AISI

  • A mainline plan for mitigating misalignment risk

    • Ryan Greenblatt, Redwood Research

  • Believe it or not: how deeply do LLMs believe implanted facts?

    • Stewart Slocum, xAI (work done at Anthropic)

6:00 — Dinner

You’re welcome to stay as long as you like for dinner and more conversations.

Where? 

Sessions will be hosted in Constellation’s workspace in Downtown Berkeley. Meals, snacks, drinks, and workspace amenities (workstations, call booths, meeting rooms) available to all attendees.

The space will be open from 9:00 until late for conversations and co-working between sessions. Specialized workspace needs are available upon request (contact Isaac, [email protected]).

Who Should Attend?

​We welcome people who work on frontier AI safety, or work at a frontier AI company and are interested in safety. We value thoughtful critics and varied perspectives as an important part of a robust truth-seeking culture – asking challenging questions helps shape productive discussions.

​Very cool people, good vibes, great conversations

Adrien Ecoffet, OpenAI

What is Constellation? 

We are an independent, nonprofit research center hosting over 200 individual researchers and organizations working on frontier problems in AI safety. You'll connect with many of our affiliates throughout the event, both as speakers and fellow attendees.

Interested in working from Constellation for a day? Just let Isaac know the date you’ll be coming!

Contact Information

For questions, requests, or more information, email Isaac: [email protected]

Location
Please register to see the exact location of this event.
Berkeley, California