AI benchmarks

TL;DR Blind Doctor Test: Doctors preferred Google DeepMind’s AI co-clinician to GPT-5.4-thinking-with-search by 63 to 30 across 98…

OpenAI released GPT-5.5 on April 23, 2026, just six weeks after GPT-5.4, and the model’s performance on independent…

The company said the biggest leap is in agentic coding and computer. On Terminal-Bench 2.0, which tests complex…

A consortium of former OpenAI and Google DeepMind researchers quietly dropped Yahu onto X overnight, triggering global trending,…

Alphabet-X’s Astra model passed the Lovelace Test 2.0 today, scoring 96.8% on the ARC-AGI benchmark and solving a…

ChatGPT Images 2.0 debuted on April 20 and has already claimed the top spot on the GenEval and…

Google DeepMind’s Gemma 4 has landed benchmark scores above both ChatGPT and Gemini Chat, and because it’s open-weight,…

The assumption that the US holds a durable lead in AI model performance is not well-supported by the…

The open-source AI movement has never lacked for options. Mistral, Falcon, and a growing field of open-weight models…

Google DeepMind AI Co-Clinician Tops GPT-5.4 in 98-Query Test But Still Trails Physicians