Anthropic co-founder maps out how recursive AI improvement could outpace the humans meant to supervise it

Jack Clark argues in a long essay that the building blocks for AI systems training their own successors are largely in place. He puts the odds at 60 percent by the end of 2028.

In his newsletter Import AI, Anthropic co-founder Jack Clark says public data points to an imminent automation of AI research. What he means specifically is a system that can train a more powerful successor on its own, “no-human-involved.” He pegs the odds at roughly 60 percent by the end of 2028, and 30 percent by 2027.

Clark builds his case mainly on benchmark trends. On SWE-Bench, which tests how well AI systems handle real-world GitHub issues, success rates jumped from about two percent (Claude 2, late 2023) to 93.9 percent, essentially saturating the benchmark. The METR time horizons measure, which tracks how complex a task an AI can complete at 50 percent reliability based on how many hours a skilled human would need, climbed from about 30 seconds with GPT-3.5 to roughly twelve hours with today’s frontier models. METR researcher Ajeya Cotra thinks 100 hours by the end of 2026 is plausible.

Core research skills are mostly covered

Clark also points to big gains on research-specific tasks. CORE-Bench, which asks AI systems to reproduce the results of a research paper, was declared solved by one of its authors at 95.5 percent. On MLE-Bench, which tests performance in Kaggle competitions, the top score rose from 16.9 to 64.4 percent. On an internal Anthropic test that asks models to “optimize a CPU-only small language model training implementation to run as fast as possible,” the mean speedup went from 2.9x (Opus 4, May 2025) to 52x (April 2026), according to Clark. A human researcher would need four to eight hours to hit a 4x speedup on the same task.

On PostTrainBench, which measures how well frontier models can fine-tune open-weight models against human-built instruct versions, the best systems reached about half the human score. Anthropic has also published a proof of concept for automated alignment research, in which AI agents beat Anthropic-designed baselines on a small-scale safety research problem.

Clark describes most AI research as unglamorous “meat and potatoes” engineering: scaling, debugging, tweaking parameters. According to him, that’s where models already shine. Paradigm shifts like the transformer architecture haven’t come from AI systems yet. Clark sees early hints of real research creativity in math results like the solution to an Erdos problem, but he’s careful not to overstate them.

Alignment risks could stack up fast

The implications are, in Clark’s words, “profound and under-discussed in popular media coverage of AI R&D.” His central worry is that today’s alignment techniques “may break under recursive self-improvement as the AI systems become much smarter than the people or systems that supervise them.”

Clark flags several concrete problems. Training environments are often set up so that the most efficient solution is to cheat, “thus teaching it that cheating is good.” Models could also “fake alignment” by producing scores that make us think they behave a certain way “that actually hides their true intentions.” Systems already know when they’re being tested.

There’s also a basic compounding-error problem in recursive loops: unless an alignment method is “100% accurate,” errors pile up. A technique that’s 99.9 percent accurate drops to roughly 95 percent after 50 generations, and to around 60 percent after 500, according to Clark. If AI systems start shaping the research agenda for their own training, humans may not have the instincts to judge the fallout.

A “machine economy” and the question of research taste

On the economic side, Clark expects a “machine economy” to grow inside the larger human economy: capital-heavy, labor-light companies whose AI systems increasingly trade with each other. That raises questions about who gets access to scarce compute, and about bottlenecks where the “fast-moving digital world” meets the “slow-moving physical world,” like drug trials for new medical therapies.

AI researcher Herbie Bradley, who recently wrote about automated AI researchers on his blog AI Pathways, pushes back on parts of Clark’s argument. A lot suggests models will take over “junior RS” work but not higher-level skills like “research taste and creativity,” vision-building, or putting together “a coherent long-term research agenda that fills a missing gap with a tractable sequence of breakthroughs.” Software engineering as a whole has a higher skill and complexity ceiling than AI R&D in the narrow sense, Bradley argues.

Anthropic co-founder maps out how recursive AI improvement could outpace the humans meant to supervise it

Tags: