Anthropic gives Claude AI power to end conversations as part of 'model welfare' push

5 months ago 12
ARTICLE AD BOX

In the fast moving world of artificial intelligence there is almost some new feature or model being launched every single day but one feature that no one saw coming is from Anthropic, the maker of popular AI chatbot Claude. The AI startup is now providing  some of its AI models the ability to end conversations on Claude as part of its exploratory work on ‘model welfare’. 

“This is an experimental feature, intended only for use by Claude as a last resort in extreme cases of persistently harmful and abusive conversations.” the company states

Anthropic says that a vast majority of users will never experience Claude ending a conversation on its own. 

Moreover, the company adds that Claude's conversation ending ability is a last resort when multiple attempts at redirection hae failed and “hope of a productive interaction has been exhausted, or when a user explicitly asks Claude to end a chat”.

“The scenarios where this will occur are extreme edge cases—the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude.” Anthropic adds

Why is Anthropic adding conversation ending ability to Claude?

Anthropic says that the moral status of Claude or other large language models (LLMs) remains highly uncertain, meaning that there is no clarity yet on whether these AI systems could ever feel anything like pain, distress or well being. 

However, the AI startup is taking this possibility seriously and believes its important to investigate it seriously. In the meantime, the company is also looking at “low cost intervenetions” which dont cost much could potentially reduce harm to AI systems and allowing the LLM to end the conversation is one such method. 

Anthropic says that that it tested Claude Opus 4 before its release and part of that testing was a “model welfare assessment,”. The company found that Claude consistently rejected requested where there there was possibility of harm. For example the model 

When users kept pushing for dangerous or abusive content even after refusals, the AI model’s responses started looking stressed or uncomfortable. Some of these requests where Claude showed signs of ‘distress’ were about generating sexual content around minors or attempts to solicit information that would enable large scale violence or acts of terror. 

Read Entire Article