AI Breakthroughs But At A Cost

Published this week, the 2026 Stanford HAI AI Index shows that AI models are achieving breakthrough results in science and complex reasoning. AI’s workforce disruption has now moved from prediction to reality, hitting young workers first.

Originated in 2017 and now at its 9th iteration, the Artificial Intelligence Index tracks worldwide trends in AI using a mix of its own new research and findings from many other sources. It is led by a steering committee of academic and industry experts and produced by the Stanford Institute for Human-Centered AI.
This year’s report comprises nine chapters and an Appendix with references and links to more research with 423 pages in total.

One of the report’s key charts is the one that compares human performance to that of AI, which we first looked at for the 2024 AI Index Report. Since then more benchmarks have been added to the chart, making a total of 11, and there are now only three benchmarks for which AI has yet to surpasses human performance. Agent multimodal computer use (OSWorld) which is the latest, to be added is the one that still needs to make up for a noticeable gap is at less than 80%. Mathematical Reasoning (AIME) and Autonomous Software Engineering (SWE-bench verified) are both close to 100%. The top scorer on all the benchmarks is for PhD-level science questions (GPQA Diamond), followed closely by Competition-level mathematics (MATH) and Multimodal understanding and reasoning (MMMU).

The United States is outspending any other country on AI, but while it previously outpaced all other global regions – in model size, performance, artificial intelligence research, citations, and more – China has now almost obliterated its lead. In the following chart showing the Elo rating for each of the 8 most popular models on the LMSYS Chatbot Arena, Chinese models Alibaba and Deepseek with scores of 1,449 and 1,424 are only slightly behind the leaders Anthropic (1,503); xAI (14,94); Google (1,494) and OpenAI (1,481).

aiiperformance

The report comments:

U.S. and Chinese models have traded places at the top of the performance rankings multiple times since early 2025. In February 2025, DeepSeek-R1 briefly matched the top U.S. model, and as of March 2026 Anthropic’s top model leads by just 2.7%. The U.S. still produces more top-tier AI models and higher-impact patents, while China leads in publication volume, citations, patent output, and industrial robot installations.

While reporting on performance breakthroughs, the AI Index Report also raises concerns including AI’s environmental impact:

Grok 4’s estimated training emissions reached 72,816 tons of CO2 equivalent. AI data center power capacity rose to 29.6 GW, comparable to New York state at peak demand, and annual GPT-4o inference water use alone may exceed the drinking water needs of 12 million people.

As already reported on I Programmer, see Computer Science Under Threat, AI is impacting entry level jobs, a human cost.

AAI Jobs

As this chart shows, U.S. developers ages 22 to 25 saw employment fall nearly 20% from 2024, even as the headcount for older developers continues to grow.

AI Index24 SQ

More Information

Artificial Intelligence Index

Artificial Intelligence Index Report 2026

Insights From AI Index 2024 Report

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Facebook or Linkedin.

Comments

Make a Comment or View Existing Comments Using Disqus

or email your comment to: comments@i-programmer.info

AI Breakthroughs But At A Cost

Tags: