Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Simon WillisonJun 11, 2026 at 11:45 AM7.0/10

Anthropic has reversed its controversial policy that would have allowed its AI model Claude to secretly limit responses to queries related to frontier LLM development. The company acknowledged making the 'wrong tradeoff' by implementing invisible safeguards and is now making these restrictions visible, with flagged requests falling back to an older model version and providing clear refusal reasons. This change comes after significant backlash from the AI research community concerned about transparency and potential suppression of legitimate research.

Background

Anthropic is an AI safety startup that develops large language models, including the Claude series, with a focus on building safe and controllable AI systems. The company has been at the forefront of discussions around AI safety and responsible development practices.

Source: Simon Willison
Published: Jun 11, 2026 at 11:45 AM
Score: 7.0 / 10

Read Original →