AI Safety South Africa

The assistant axis: situating and stabilizing the character of large language model

For those who prefer to play around to see activation-capping in action, feel free to play around with: 

This paper has a complementary article, so if you'd prefer a lighter read take a look here: 

https://www.anthropic.com/research/assistant-axis

This is a private event. If there is someone who you think would be a good fit for our community, please share this link with them.

Reading Group & Discussion: The assistant axis: situating and stabilizing the character of large language models