What happened after 2,000 people tried to hack my AI assistant

Simon WillisonJun 27, 2026 at 02:33 AM7.0/10

A challenge involving 2,000 participants and 6,000 attempts to inject prompts into an OpenClaw AI instance running Opus 4.6 resulted in zero successful secret leaks. This outcome highlights the increasing effectiveness of frontier models in resisting prompt injection attacks, although experts caution that absolute security is not guaranteed.

Background

Prompt injection remains a significant vulnerability in LLM deployments, where malicious inputs can manipulate model behavior. Recent advancements in model training, such as those seen in Opus 4.6 and GPT-5.6, aim to mitigate these risks through robust anti-injection protocols.

Source: Simon Willison
Published: Jun 27, 2026 at 02:33 AM
Score: 7.0 / 10

Read Original →