On Thursday evening, roughly around the same time that Elon Musk was tweeting at Microsoft’s Satya Nadella that “OpenAI is going to eat Microsoft alive”, his own AI model, Grok 4, was being humbled 4-0 by OpenAI’s o3 in an AI chess exhibition tournament on Google’s Kaggle Game Arena. The chess tournament featured eight general-purpose large language models (LLMs). Besides Grok 4 and o3, it also had Gemini 2.5 Pro (Google), Gemini 2.5 Flash (Google), o4-mini (OpenAI), Claude 4 Opus (Anthropic), DeepSeek R1 and Kimi k2 (Moonshot AI).

While chess software and engines like AlphaZero and Leela have grown stronger than any chess players alive, LLMs are still just starting on the trajectory of learning the sport.

Musk’s Grok 4 had looked like the strongest fighter in the eight-player field until it reached the final, where it made some questionable knight and bishop sacrifices and blundered away the queen in more games than one. At multiple points, former world champion Magnus Carlsen burst out laughing on seeing Grok’s inexplicable moves or reacted with shock — complete with a palm on his face — as Grok lost all four games in the final. Carlsen was doing live commentary for the four games of the final for the Take Take Take app with grandmaster David Howell.

Story continues below this ad

After Grok 4 was down 3-0, Howell told Carlsen that the fourth game would also be played, rather than the tournament ending with a 3-0 scoreline. Carlsen said that made sense because “this is like watching kids’ games. In those tournaments you always play them out.”

After the final ended, Carlsen quipped: “Hope everyone feels better about their games after watching this.”

The battle for first place saw a battle between the LLMs of friends-turned-foes Altman and Musk. The duo had co-founded OpenAI a decade ago. But Musk left to launch his own rival AI company, xAI. The man who now owns X (Twitter) had also sued OpenAI last year, saying Altman violated Open AI’s original agreement which said the company would prioritise public good over profit.

Gemini 2.5 Pro ended third after defeating o4-mini.

In the first game of the Grok 4 vs o3 final, Grok inexplicably sacrificed its light-squared bishop on the 8th move itself. Then, it started to simplify the game by throwing up all of its pieces for trades, which was mind-boggling since most human players won’t try to simplify their position by trading away pieces when a whole minor piece down. Right after throwing away its bishop, Grok trades away both its knights and a pawn before offering up its queen for a trade as well. The game ended in 35 moves.

Story continues below this ad

What was interesting is that on the live broadcast, the LLMs were offering their thought process for the moves, which Carlsen and the world could see. But for some of the more inexplicable moves, there were no explanations forthcoming from the AI model.

After the first game between Musk’s Grok 4 and by Altman’s o3 ended with defeat for the xAI model, Carlsen was asked to estimate the chess strength of the two LLMs.

“800 for Grok and 1200 for o3,” he said.

READ MORE | When Magnus Carlsen met Elon Musk and played chess with Sam Altman, Mark Zuckerberg and Demis Hassabis

In game 2, at one point when Grok just gifted his queen (the most powerful piece on the board) away, Carlsen said: “It is like that one guy in a club tournament who has learnt theory and literally knows nothing else. Makes the worst blunders after that.”

Story continues below this ad

Magnus Carlsen and David Howell react as Grok 4 blunders its queen against o3 in the final. (Screengrab via Take Take Take YouTube) Magnus Carlsen and David Howell react as Grok 4 blunders its queen against o3 in the final. (Screengrab via Take Take Take YouTube)

Right after that, Grok started offering up other pieces as trades, Carlsen said: “What are you doing! What happened to (chess) principles?”

In game 3, Grok blundered a knight and then the queen again! At this point, Carlsen burst out in a fit of giggles and said: “It thinks it’s playing giveaway or something. It was the only way to blunder the queen as well.”

The fourth game was the hardest fought, but still ended with a win for o3.

Carlsen said that watching the final was like watching an “old-school world chess championship match where both players play the same openings… like (Mikhail) Botvinnik vs (David) Bronstein or (Alexander) Alekhine vs (José Raúl) Capablanca.”

Story continues below this ad

Carlsen’s verdict on chess ability of other LLM models

After the second game ended, Carlsen offered his verdict on the chess-playing skills of other AI models.

“Both Gemini and Mini were not very good. Claude disappointed me as well. I expected Claude to be… I’ve heard great things about Claude,” Carlsen said.

After the third game sealed a victory of o3, Carlsen said: “o3 is fairly ruthless in conversions, it looks like a chess player. Grok looks like it learnt a few opening moves and knows the rules but not much more. Grok’s moves are chess-related moves. They just came at the wrong time and in weird sequences.”