How an artificial intelligence (AI) model performs is critically important to the enterprises investing in and deploying the technology. But there’s debate whether advancements in large language models are slowing.
This debate centers around AI scaling laws.
Popularized by OpenAI, the concept behind AI scaling laws is simply this: Larger models trained on more compute will yield better performance. OpenAI’s 2020 paper, “Scaling Laws for Neural Language Models” was the first influential paper to demonstrate scaling laws.
Google DeepMind’s 2022 paper, “Training Compute-Optimal Large Language Models” added a crucial insight: It demonstrated that data, instead of model size, and compute were the two key factors influencing a model’s performance. Its Chinchilla model, which was less than half the size of GPT-3 but with four times more data, outperformed GPT-3.
“Over the past few years, AI labs have hit on what feels like a winning strategy: scaling more parameters, more data, more compute,” said Garry Tan, president of startup accelerator Y Combinator, on its video series YC Decoded. “Keep scaling your models and they keep improving.”
But there are signs that initial leaps in performance are decelerating.
The two main fuels for scaling — data and computing — are becoming scarcer and more expensive, wrote Adnan Masood, UST’s chief architect of AI and machine learning, in a blog post. “These trends strongly indicate a plateau in the current trajectory of large language models.”
For example, in knowledge quizzes, math word problems and coding tests, improvements are “flattening,” Masood said. He noted that in the knowledge quiz MMLU benchmark, GPT-3 recorded 43.9% and doubled to 86.4% for GPT-4 in 2023 but has since plateaued at 90% in 2024.
“If the old scaling laws are beginning to lose their edge, what comes next?” Tan asked.
The answer from both Tan and Masood is that scaling laws are changing. AI model performance is still advancing, but it’s now due to new techniques and not just making data and computing power larger.
That’s why OpenAI introduced its o1 and o3 AI reasoning models after the GPT-series, Tan said. For the o, or omni, models, OpenAI used “chain of thought” techniques to get the model to think through its answers. The result was improved performance. (OpenAI skipped naming the model o2 because it was the name of a telecom provider.)
“OpenAI researchers found that the longer o1 was able to think, the better it performed,” Tan said. “Now, with the recent release of its successor, o3, the sky seems to be the limit for this new paradigm of scaling LLMs.”
Tan said o3 “smashed benchmarks that were previously considered far out of reach for AI.”
In the U.S., leading AI models retain the usage top spot for only about three weeks before being overtaken — especially by open‑source competitors, according to a June 2025 report from Innovation Endeavors. The model cycle release remains fast — if no longer exponential.
Scaling laws are not yet dying, but the AI community is preparing for a future that emphasizes smarter architectures, reasoning‑driven models, and use of distributed data sources.
Read more:
What Businesses Should Ask When Choosing an AI Model
Court Rules Anthropic Doesn’t Need Permission to Train AI With Books
Meta Chief AI Scientist Slams Quest for Human-Level Intelligence