ChatGPT is obsessed with goblins – and it could be a problem

A cosplayer dressed as Green Goblin during New York Comic Con 2022 on 8 October, 2022 in New York City (Getty Images)

OpenAI has solved a “goblin mystery” impacting ChatGPT that caused the AI chatbot to become obsessed with the mythical creatures.

Over the last six months, mentions of the word ‘goblin’ have shot up in ChatGPT, even in response to unrelated queries. The phenomenon prompted an investigation by OpenAI researchers, who found that the bug “crept in subtly” following the release of a new ChatGPT model last November.

The new model was designed to be “smarter and more conversational” than its predecessors, featuring a variety of personality settings like ‘Nerdy’, ‘Candid’, and ‘Quirky’.

Shortly after its release, ChatGPT users and researchers began noticing a pattern of repeated mentions of goblins, gremlins and other fantasy creatures.

“Starting with GPT-5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors,” OpenAI notes in a blog post about the issue.

“We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.”

Safety researchers at the company reported a 175 per cent increase in mentions of the word ‘goblin’ following the release of GPT-5.1 as a result of the model being incentivised to use playful metaphors.

The training method was not corrected for future models and when GPT-5.4 launched in March, use of ‘goblin’ had increased nearly 4,000 per cent in the Nerdy personality type, with mentions increasing by the same relative proportion across other models.

“The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them,” OpenAI noted.

“Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.”

The glitch was relatively harmless in this instance, but it demonstrates a broader flaw with leading artificial intelligence models and the manner in which they are trained and developed.

Reinforcement learning and the use of reward signals can cause AI models to mutate in unexpected and unintended ways.

OpenAI said its research and safety team has built new ways to investigate rogue patterns and will be conducting more audits of of model behaviour in the future.

ChatGPT is obsessed with goblins – and it could be a problem

Tags: