{"id":244346,"date":"2025-07-07T04:14:26","date_gmt":"2025-07-07T04:14:26","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/244346\/"},"modified":"2025-07-07T04:14:26","modified_gmt":"2025-07-07T04:14:26","slug":"128-days-later-semianalysis","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/244346\/","title":{"rendered":">128 Days Later\u00a0 \u2013 SemiAnalysis"},"content":{"rendered":"<p>SemiAnalysis is hiring an analyst in New York City for Core Research, our world class research product for the finance industry.\u00a0<a href=\"https:\/\/app.dover.com\/apply\/SemiAnalysis\/6ec0e8df-3da0-469c-9422-0c8d5dd624a7\/?rs=76643084\" target=\"_blank\" rel=\"noreferrer noopener\">Please apply here<\/a><\/p>\n<p>It\u2019s been a bit over 150 days since the launch of the Chinese LLM DeepSeek R1 shook stock markets and the Western AI world. R1 was the first model to be publicly released that matched OpenAI\u2019s reasoning behavior. However, much of this was overshadowed by the fear that DeepSeek (and China) would commoditize AI models given the <a href=\"https:\/\/api-docs.deepseek.com\/quick_start\/pricing\" target=\"_blank\" rel=\"noreferrer noopener\">extremely low price<\/a> of $0.55 input\/$2.19 output, undercutting the then SOTA model o1 by 90%+ on output token pricing. Reasoning model prices have dropped significantly since, with OpenAI recently dropping their flagship model price by 80%.\u00a0\u00a0<\/p>\n<p><img loading=\"lazy\" data-recalc-dims=\"1\" decoding=\"async\" width=\"1157\" height=\"623\" data-attachment-id=\"150441466\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441466\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-28.png?fit=1157%2C623&amp;ssl=1\" data-orig-size=\"1157,623\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-28.png?fit=300%2C162&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-28.png?fit=1024%2C551&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-28.png\" alt=\"\" class=\"wp-image-150441466\"  \/>Source: SemiAnalysis, Company prices<\/p>\n<p>\u00a0R1 got an update as DeepSeek continued to scale RL after release. This resulted in the model improving in many domains, particularly coding. This continuous development and improvement is a hallmark of the new paradigm we previously covered.\u00a0\u00a0\u00a0<\/p>\n<p>Today we look at DeepSeek\u2019s impact on the AI model race and the state of AI market share.\u00a0<\/p>\n<p>A Boom and\u2026 Bust?\u00a0<\/p>\n<p>Consumer app traffic to DeepSeek spiked following release, resulting in a sharp increase in market share. Because Chinese usage is poorly tracked and Western labs are blocked in China, the numbers below understate DeepSeek\u2019s total reach. However the explosive growth has not kept pace with other AI apps and DeepSeek market share has since declined.\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1230\" height=\"736\" data-attachment-id=\"150441467\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441467\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-29.png?fit=1230%2C736&amp;ssl=1\" data-orig-size=\"1230,736\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-29.png?fit=300%2C180&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-29.png?fit=1024%2C613&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-29.png\" alt=\"\" class=\"wp-image-150441467\"  \/>Source: SemiAnalysis, SensorTower<\/p>\n<p>For web browser traffic, the data is even more grim with DeepSeek traffic down in absolute terms since release. The other leading AI model providers have all seen impressive growth in users over the same time frame.\u00a0\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1656\" height=\"798\" data-attachment-id=\"150441450\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441450\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-24.png?fit=1656%2C798&amp;ssl=1\" data-orig-size=\"1656,798\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-24.png?fit=300%2C145&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-24.png?fit=1024%2C493&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-24.png\" alt=\"\" class=\"wp-image-150441450\"  \/>Source: SemiAnalysis, SimilarWeb<\/p>\n<p>The poor user momentum for DeepSeek-hosted models stands in sharp contrast to third party hosted instances of DeepSeek. Aggregate usage of\u00a0 R1 and V3 on third party hosts continues to grow rapidly, up nearly 20x since R1 first released.\u00a0\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1099\" height=\"598\" data-attachment-id=\"150441469\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441469\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-31.png?fit=1099%2C598&amp;ssl=1\" data-orig-size=\"1099,598\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-31.png?fit=300%2C163&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-31.png?fit=1024%2C557&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-31.png\" alt=\"\" class=\"wp-image-150441469\"  \/>Source: SemiAnalysis, OpenRouter<\/p>\n<p>Digging deeper into the data, by splitting out the DeepSeek tokens into just those hosted by the company itself, we can see that DeepSeek\u2019s share of total tokens continues to fall every month.\u00a0\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1044\" height=\"589\" data-attachment-id=\"150441470\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441470\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-32.png?fit=1044%2C589&amp;ssl=1\" data-orig-size=\"1044,589\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-32.png?fit=300%2C169&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-32.png?fit=1024%2C578&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/1751861664_838_image-32.png\" alt=\"\" class=\"wp-image-150441470\"  \/>Source: SemiAnalysis, OpenRouter<\/p>\n<p>So why are users shifting away from DeepSeek\u2019s own web app and API service in favor of other open source providers despite the rising popularity of DeepSeek\u2019s models and the apparently very cheap price?\u00a0\u00a0<\/p>\n<p>The answer lies in tokenomics and the myriad of tradeoffs between the KPIs for serving a model. These tradeoffs mean a model\u2019s price per token is an OUTPUT of these KPI decisions which can be tuned based on the model providers\u2019 hardware and model setup.\u00a0<\/p>\n<p>Tokenomics Basics\u00a0<\/p>\n<p>Tokens are the fundamental building blocks of AI models. AI models can learn through reading the internet in token form and produce output in the form of text, audio, image, or action tokens. A token is just a bite-sized chunk of text (like \u201cfan\u201d, \u201ctas\u201d, \u201ctic\u201d) that a large language model counts and processes instead of whole words or letters.\u00a0<\/p>\n<p>When Jensen talks about datacenters becoming AI factories, the input and output of these factories are tokens. Much like a physical factory, AI factories make money with a P x Q equation: P is the price per token and Q is the quantity of input and output tokens.\u00a0\u00a0<\/p>\n<p>Unlike a normal factory, the token price is a variable that model providers can solve for based on the other attributes of the model. We list the key KPIs below\u00a0<\/p>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Latency or Time-to-First-Token<\/strong>: how long a model takes to generate a token. This is also known as \u2018time to first token\u2019 or approximately how long it takes the model to complete the prefill stage (ie encoding input tokens into the KVCache) and start producing the first token in the decode stage.\u00a0<\/li>\n<\/ol>\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Interactivity<\/strong>: how fast each token is produced, oftentimes measured in tokens per second per user. Some providers also talk about the inverse of interactivity which is the average time between each output token (time per output token or TPOT). Human reading speed is 3-5 words per second but most model providers have settled on output speeds of around 20-60 tokens per second.\u00a0\u00a0<\/li>\n<\/ol>\n<ol start=\"3\" class=\"wp-block-list\">\n<li><strong>Context Window<\/strong>: how many tokens can be held in the \u2018short term memory\u2019 of the model before earlier tokens are evicted and the model \u2018forgets\u2019 the older parts of the conversation. Different use cases require different context windows. Large document and code base analysis benefit from larger context windows that allow the model to coherently reason over data.\u00a0<\/li>\n<\/ol>\n<p>For any given model, you can manipulate these 3 KPIs to produce effectively any price per token. Therefore it is not always productive or practical to discuss tokens on purely price per million-token ($\/Mtok) as this ignores the nature of the workload and requirements of the token user.\u00a0\u00a0<\/p>\n<p class=\"has-text-align-left has-large-font-size\"><strong>Subscribe to get notified of all SemiAnalysis articles<\/strong><\/p>\n<p>Please verify your email address to proceed.<\/p>\n<p>By subscribing, you agree to the\u00a0<a href=\"https:\/\/semianalysis.com\/privacy-policy\/\" target=\"_blank\" rel=\"noopener\">Privacy Policy<\/a>\u00a0and\u00a0<a href=\"https:\/\/semianalysis.com\/terms-of-service\/\" target=\"_blank\" rel=\"noopener\">Terms and Conditions<\/a>.<\/p>\n<p>DeepSeek Trade-Offs\u00a0<\/p>\n<p>Now let\u2019s look at the tokenomics of how DeepSeek serves its R1 model to understand why they have been losing market share on their own model.\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1385\" height=\"809\" data-attachment-id=\"150441471\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441471\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-33.png?fit=1385%2C809&amp;ssl=1\" data-orig-size=\"1385,809\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-33.png?fit=300%2C175&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-33.png?fit=1024%2C598&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-33.png\" alt=\"\" class=\"wp-image-150441471\"  \/>Source: <a href=\"https:\/\/openrouter.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/openrouter.ai\/<\/a> accessed in May 2025. Blended $\/Mtok calcuated with 3:1 input:output ratio<\/p>\n<p>Plotting Latency against Price, we can see that DeepSeek\u2019s own service is no longer the cheapest for its latency. In fact, a big reason why DeepSeek is able to price their product so cheaply is because they force users to wait many seconds before the model responds with the first token. This compares to some other providers offering it for the same price but delaying responses by much less time. Token consumers can pay $2-4 for nearly no latency with providers like Parasail or Friendli. Microsoft Azure offers the service for 2.5x more than DeepSeek but with 25s less latency. Since we pulled this data, the situation has become even more grim for DeepSeek as almost all R1 0528 instances are now hosted with <a href=\"https:\/\/openrouter.ai\/deepseek\/deepseek-r1-0528\" target=\"_blank\" rel=\"noreferrer noopener\">sub-5 second latencies<\/a>.\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1385\" height=\"810\" data-attachment-id=\"150441472\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441472\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-34.png?fit=1385%2C810&amp;ssl=1\" data-orig-size=\"1385,810\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-34.png?fit=300%2C175&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-34.png?fit=1024%2C599&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-34.png\" alt=\"\" class=\"wp-image-150441472\"  \/>Source: <a href=\"https:\/\/openrouter.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/openrouter.ai\/<\/a> accessed in May 2025. Blended $\/Mtok calcuated with 3:1 input:output ratio, bubble size represents context window size<\/p>\n<p>Using the same plot but adding bubble size for the context window, we can see another tradeoff that DeepSeek runs to deliver a very cheap model with limited inference compute resources. They run a 64K context window which is one of the smallest of the major model providers. Smaller context windows limit use cases like coding which require a model to coherently remember a large amount of tokens across a code base to reason across. At the same price you can get &gt;2.5x the context size with providers like Lambda and Nebius in the above chart.\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"907\" height=\"614\" data-attachment-id=\"150441398\" data-permalink=\"https:\/\/semianalysis.com\/2025\/07\/03\/deepseek-debrief-128-days-later\/screenshot-332\/\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/Screenshot-332.png?fit=907%2C614&amp;ssl=1\" data-orig-size=\"907,614\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot (332)\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/Screenshot-332.png?fit=300%2C203&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/Screenshot-332.png?fit=907%2C614&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/Screenshot-332.png\" alt=\"\" class=\"wp-image-150441398\"  \/>Source: <a href=\"https:\/\/semianalysis.com\/2025\/05\/23\/amd-vs-nvidia-inference-benchmark-who-wins-performance-cost-per-million-tokens\/\" target=\"_blank\" rel=\"noreferrer noopener\">SemiAnalysis benchmarks<\/a><\/p>\n<p>Digging into hardware, we can see with the above <a href=\"https:\/\/semianalysis.com\/2025\/05\/23\/amd-vs-nvidia-inference-benchmark-who-wins-performance-cost-per-million-tokens\/\" target=\"_blank\" rel=\"noreferrer noopener\">benchmarking of AMD and NVDA<\/a> chips on DeepSeek V3 how providers solve for $\/Mtok: by batching more users simultaneously on a single GPU or cluster of GPUs, the model provider can INCREASE the total wait experienced by the end user with higher latency and slower interactivity (measured by the x-axis in Median End to End Latency per User) to DECREASE the total cost per token. Higher batch sizes and slower interactivity will reduce the cost per token at the expense of a much worse user experience.\u00a0\u00a0<\/p>\n<p>To be clear, this is an active decision by DeepSeek. They are not interested in making money off users or in serving them lots of tokens via a chat app or an API service. The company is singularly focused on reaching AGI and is not interested in end user experience.\u00a0\u00a0<\/p>\n<p>Batching at extremely high rates allows them to use the minimal amount of compute possible for inference and external usage. This keeps the maximal amount of compute internal for research and development purposes. <a href=\"https:\/\/semianalysis.com\/2025\/06\/08\/scaling-reinforcement-learning-environments-reward-hacking-agents-scaling-data\/#rl-is-an-inference-game-but-china-lacks-the-chips\" target=\"_blank\" rel=\"noreferrer noopener\">As we have previously discussed<\/a>, export controls have limited the Chinese ecosystem\u2019s capability in serving models. As such, for DeepSeek, it makes sense to open source. Whatever compute they have is kept internal, while other clouds can host their model so they can win mind share and global adoption. While the export controls have greatly limited China\u2019s capability in inferencing models at scale, we do not believe it has equally hindered their ability to train a useful model as evidenced by recent releasees from <a href=\"https:\/\/github.com\/Tencent-Hunyuan\/Hunyuan-A13B\" target=\"_blank\" rel=\"noreferrer noopener\">Tencent<\/a>, <a href=\"https:\/\/qwenlm.github.io\/blog\/qwen3\/\" target=\"_blank\" rel=\"noreferrer noopener\">Alibaba<\/a>, <a href=\"https:\/\/ernie.baidu.com\/blog\/posts\/ernie4.5\/\" target=\"_blank\" rel=\"noreferrer noopener\">Baidu,<\/a> and even <a href=\"https:\/\/github.com\/rednote-hilab\/dots.llm1\" target=\"_blank\" rel=\"noreferrer noopener\">Rednote<\/a>.\u00a0<\/p>\n<p>Anthropic is More Like DeepSeek than They\u2019d like to Admit\u00a0<\/p>\n<p>In the world of AI, the only thing that matters is compute. Like DeepSeek, Anthropic is compute constrained. Anthropic has focused their product development on code and have seen strong adoption among coding applications like Cursor. We think that Cursor usage is the ultimate eval as it represents what users care about most: <strong>cost <\/strong>and <strong>experience<\/strong>. Anthropic has ranked first for over a year now, which is decades in the AI industry.\u00a0\u00a0<\/p>\n<p>Having noticed the success of token consumers like Cursor, the company launched Claude Code, a coding tool built into the terminal. Claude Code usage has skyrocketed, leaving OpenAI\u2019s codex in the dust.\u00a0<\/p>\n<p>Google, in response, also released their own tool: Gemini CLI. While it is a similar coding tool to Claude Code, Google uses their compute advantage with TPUs to offer unbelievably large request limits at no cost to users.\u00a0\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1403\" height=\"737\" data-attachment-id=\"150441404\" data-permalink=\"https:\/\/semianalysis.com\/2025\/07\/03\/deepseek-debrief-128-days-later\/screenshot-344\/\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/Screenshot-344.png?fit=1403%2C737&amp;ssl=1\" data-orig-size=\"1403,737\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot (344)\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/Screenshot-344.png?fit=300%2C158&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/Screenshot-344.png?fit=1024%2C538&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/Screenshot-344.png\" alt=\"\" class=\"wp-image-150441404\"  \/>Source: Google<\/p>\n<p>Claude Code, for all of its wonderful performance and design, is <strong>expensive. <\/strong>In many ways, the success of Anthropic\u2019s models in code has placed significant stress on the company. <strong>They are squeezed tight on compute.\u00a0<\/strong>\u00a0<\/p>\n<p>This is most evident in Claude 4 Sonnet\u2019s output speed on the API. Since the launch of Claude 4 Sonnet, the speed has decreased by 40% to just above 45 tokens per second. The reason for this is not unlike DeepSeek\u2019s \u2013 to manage all the incoming requests with the available compute, they have to batch at higher rates. Coding usage also tends to skew towards larger token count conversations which worsens the crunch on compute resources compared to lower token count casual chat applications. Regardless, comparable models like o3 and Gemini 2.5 Pro run at significantly faster speeds, reflecting the much larger compute resources at OpenAI and Google.\u00a0\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1718\" height=\"1005\" data-attachment-id=\"150441468\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441468\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-30.png?fit=1718%2C1005&amp;ssl=1\" data-orig-size=\"1718,1005\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-30.png?fit=300%2C175&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-30.png?fit=1024%2C599&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-30.png\" alt=\"\" class=\"wp-image-150441468\"  \/>Source: SemiAnalysis, Artificial Analysis<\/p>\n<p>Anthropic is focused on acquiring more compute, striking a major deal with Amazon which we have covered before.\u00a0\u00a0 <\/p>\n<p>Anthropic is getting more than half a million Trainium chips, which they will then be used for both inference and training. This relationship is still a work-in-progress as, despite popular opinion, Claude 4 was not pretrained on AWS Trainium. It was trained on GPUs and TPUs.<\/p>\n<p>Anthropic also turned to their other major investor Google of compute. Anthropic rents significant amounts of compute from GCP, specifically TPUs. Following this success, Google Cloud is expanding their offerings to other AI companies, striking a deal with OpenAI. Unlike previous reporting, Google is only renting GPUs to OpenAI \u2013 not TPUs.\u00a0\u00a0<\/p>\n<p>Speed Can be Compensated for\u00a0<\/p>\n<p>Claude\u2019s speed is indicative of their compute constraints, but generally Anthropic\u2019s UX is better than DeepSeek. First, the speed, despite being low, is faster than DeepSeek\u2019s 25 tokens per second. Second, Anthropic models require significantly less tokens than other models to answer a question. This means that despite the speed, users experience a significantly lower end-to-end response time.\u00a0\u00a0<\/p>\n<p>While this can depend on workload, Gemini 2.5 Pro and DeepSeek R1-0528 are more than 3 times as wordy as Claude. Gemini 2.5 Pro, Grok 3, and DeepSeek R1 used significantly more tokens to run Artificial Analysis\u2019 intelligence index, which aggregates several varied benchmark scores together. Indeed, Claude has the lowest amount of total output tokens for leading reasoning models and showed an impressive improvement over Claude 3.7 Sonnet.\u00a0\u00a0<\/p>\n<p>This aspect of tokenomics shows that there are many dimensions on which providers are working to improve models. It is not just more intelligence, but more intelligence <strong>per token produced.<\/strong>\u00a0<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1245\" data-attachment-id=\"150441464\" data-permalink=\"https:\/\/semianalysis.com\/?attachment_id=150441464\" data-orig-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-26-scaled.png?fit=2560%2C1245&amp;ssl=1\" data-orig-size=\"2560,1245\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-26-scaled.png?fit=300%2C146&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/semianalysis.com\/wp-content\/uploads\/2025\/07\/image-26-scaled.png?fit=1024%2C498&amp;ssl=1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/image-26-scaled.png\" alt=\"\" class=\"wp-image-150441464\"  \/>Source: Artificial Analysis Intelligence Index, SemiAnalysis<\/p>\n<p>Rise of the Inference Clouds\u00a0<\/p>\n<p>With the meteoric rise of Cursor, Windsurf, Replit, Perplexity, and other \u201cGPT Wrappers\u201d or AI token-powered apps hitting mainstream recognition, we are seeing more and more companies emulate Anthropic\u2019s focus on selling tokens as a service, rather than bundled as a monthly subscription like ChatGPT.\u00a0<\/p>\n<p>Next, we will explore what is next for DeepSeek and address rumors of a delayed R2.\u00a0<\/p>\n<p>Subscribe for full access to this article<\/p>\n<p class=\"has-text-align-center\">With a SemiAnalysis subscription you\u2019ll get access to newsletter articles and article discussions.<\/p>\n<p class=\"has-text-align-center has-small-font-size\">Model access not included \u2013 please reach out to <a href=\"https:\/\/semianalysis.com\/2025\/07\/03\/deepseek-debrief-128-days-later\/mailto:sales@semianalysis.com\" target=\"_blank\" rel=\"noopener\">sales@semianalysis.com<\/a> for our institutional offerings.<\/p>\n<p>Please verify your email address to proceed.<\/p>\n<p class=\"has-text-align-center\">By subscribing, you agree to the\u00a0<a href=\"https:\/\/semianalysis.com\/privacy-policy\/\" target=\"_blank\" rel=\"noopener\">Privacy Policy<\/a>\u00a0and\u00a0<a href=\"https:\/\/semianalysis.com\/terms-of-service\/\" target=\"_blank\" rel=\"noopener\">Terms and Conditions<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"SemiAnalysis is hiring an analyst in New York City for Core Research, our world class research product for&hellip;\n","protected":false},"author":2,"featured_media":244347,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,53,16,15],"class_list":{"0":"post-244346","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-technology","11":"tag-uk","12":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114810012987595843","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/244346","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=244346"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/244346\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/244347"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=244346"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=244346"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=244346"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}