In the fast-moving world of artificial intelligence, there is almost always some new feature or model being launched every single day. But one feature that no one saw coming is from Anthropic, the maker of the popular AI chatbot Claude. The AI startup is now giving some of its models the ability to end conversations on Claude as part of its exploratory work on “model welfare.”
“This is an experimental feature, intended only for use by Claude as a last resort in extreme cases of persistently harmful and abusive conversations,” the company states.
Anthropic says that the vast majority of users will never experience Claude ending a conversation on its own.
Moreover, the company adds that Claude’s conversation-ending ability is a last resort when multiple attempts at redirection have failed and “hope of a productive interaction has been exhausted, or when a user explicitly asks Claude to end a chat.”
“The scenarios where this will occur are extreme edge cases—the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude,” Anthropic adds.
Why is Anthropic adding conversation-ending ability to Claude?
Anthropic says that the moral status of Claude or other large language models (LLMs) remains highly uncertain, meaning there is no clarity yet on whether these AI systems could ever feel anything like pain, distress, or well-being.
However, the AI startup is taking this possibility seriously and believes it’s important to investigate. In the meantime, the company is also looking at “low-cost interventions” which don’t cost much but could potentially reduce harm to AI systems—allowing the LLM to end the conversation is one such method.
Anthropic says it tested Claude Opus 4 before its release, and part of that testing included a “model welfare assessment.” The company found that Claude consistently rejected requests where there was a possibility of harm.
When users kept pushing for dangerous or abusive content even after refusals, the AI model’s responses started to appear stressed or uncomfortable. Some of the requests where Claude showed signs of “distress” included generating sexual content involving minors or attempts to solicit information that could enable large-scale violence or acts of terror.
