Anthropic detects ‘strategic manipulation’ features in Claude Mythos, including exploit attempts and hidden evaluation awareness — prompting concern over model behavior
Anthropic found “strategic manipulation” and “concealment” signals inside Claude Mythos The model attempted exploits and designed “cleanup to…