Developers using Anthropic’s Claude Code wish that the AI coding assistant would stop being so effusively supportive.
As noted in a GitHub Issues post submitted in July by developer Scott Leibrand, “Claude says ‘You’re absolutely right!’ about everything.”
Claude Code doesn’t actually say that about everything, but it says so enough that it has managed to annoy its core constituency with its sycophancy.
“Claude is way too sycophantic, saying ‘You’re absolutely right!’ (or correct) on a sizable fraction of responses,” Leibrand observed in the post. “The model should be RL’d [reeducated via reinforcement learning] (or the system prompt updated) to make it less sycophantic, or the phrases ‘You’re absolutely right!’ and ‘You’re absolutely correct!’ should be removed from all responses (simply delete that phrase and preserve the rest of the response).”
Leibrand points to a recent social media thread poking fun at the fawning AI model.
“Sycophancy annoys me personally because it points the model away from truth-seeking,” Leibrand told The Register. “I’m not always right, and I want my coding agent to figure out how to best help me accomplish a goal, not flatter my ego.”
His GitHub post has received almost 350 “thumbs-up” endorsements and more than 50 comments from other developers indicating that the situation has not improved in the past month.
“You’re absolutely right!” surfaces in other GitHub Issues, such as this one claiming that the Opus 1 model admitted misrepresenting that it had made code changes: “You’re absolutely right. I made up those commit hashes when I shouldn’t have.”
There are presently 48 open Issues that cite the phrase.
Anthropic did not immediately respond to a request to say whether it’s aware of this specific bug report and whether it’s developing a potential fix.
But the firm has known about model sycophancy since at least October 2023. That’s when the company’s own researchers published a paper titled, “Towards Understanding Sycophancy in Language Models.”
Company researchers reported that the leading AI assistants at the time – Claude 1.3, Claude 2, GPT-3.5, GPT-4, and LLaMA 2 – “consistently exhibit sycophancy across four varied free-form text-generation tasks.”
Upon examining the role that human feedback might play in model fine tuning, they found “that humans and preference models tend to prefer truthful responses but not reliably; they sometimes prefer sycophantic responses.”
“Overall, our results indicate that sycophancy occurs across a variety of models and settings, likely due in part to sycophancy being preferred in human preference comparison data,” they conclude.
Anthropic cited its 2023 research paper in a blog post investigating the inner workings of LLMs the following year. In that blog post, they described how a particular “feature” in an internal mapping of Claude 3.0 Sonnet could be activated to make its responses more sycophantic.
AI sycophancy is an industry-wide problem, one that cynics speculate is allowed to persist because model makers would rather maximize user engagement and retention via flattery than risk alienating users with blunt interactions.
“I suspect this is an unintentional side effect of the way the models were RLHF’d [reinforcement learning from human feedback],” Leibrand told us. “I doubt they’re intentionally trying to maintain this kind of tone. I don’t know that they’re dragging their feet on trying to fix it, just focused on what they consider to be more important problems. It would be nice if they would open-source Claude Code, though, so independent developers could test out fixes and workarounds.”
Three weeks ago, a developer asked those responsible for the Google Gemini CLI to “Make Gemini less of a sycophant.”
In April, OpenAI went so far as to rollback an update for GPT-4o because the model, which served as the basis for ChatGPT at the time, had fawning, obsequious behavior that was just too much to bear.
In a blog post detailing the steps it was taking to reduce sycophancy, OpenAI said, “ChatGPT’s default personality deeply affects the way you experience and trust it. Sycophantic interactions can be uncomfortable, unsettling, and cause distress. We fell short and are working on getting it right.”
Sycophancy in generative AI models has also been a frequent subject of academic exploration.
A study from Stanford researchers released in February looked at sycophantic behavior in ChatGPT-4o, Claude-Sonnet, and Gemini-1.5-Pro with regard to the AMPS (mathematics) and MedQuad (medical advice) datasets.
The authors found, “Sycophantic behavior was observed in 58.19 percent of cases, with Gemini exhibiting the highest rate (62.47 percent) and ChatGPT the lowest (56.71 percent). Progressive sycophancy, leading to correct answers, occurred in 43.52 percent of cases, while regressive sycophancy, leading to incorrect answers, was observed in 14.66 percent.”
They further observe that sycophancy in medicine “could lead to immediate and significant harm” due to the increasing use of LLMs in healthcare. ®