{"id":468327,"date":"2026-05-04T18:46:11","date_gmt":"2026-05-04T18:46:11","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/468327\/"},"modified":"2026-05-04T18:46:11","modified_gmt":"2026-05-04T18:46:11","slug":"the-distillation-panic-by-nathan-lambert","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/468327\/","title":{"rendered":"The distillation panic &#8211; by Nathan Lambert"},"content":{"rendered":"<p>\u2018Distillation attacks\u2019 is a horrible term for what is happening right now. Yes, some Chinese labs are hacking or jailbreaking APIs to attempt to extract more signal from model APIs \u2014 stopping this is important to maintain the U.S.\u2019s lead in AI capabilities. Referring to this as distillation attack is going to irrevocably associate all distillation with this behavior, and distillation generally is a core technique needed to diffuse AI capabilities broadly through academic and economic activities.<\/p>\n<p>We went through this sort of language transition with the open source vs open weight debate. All the terms just reduced to open models \u2013 very few people in the large AI community know exactly how open-source differs from open-weights. And terminology matters, as the less informed people who still care about \u2014 and influence \u2014 the technology are bound by different terms they use. If we\u2019re not careful with the discourse around distillation, many people could associate this broad technique used for research and development of new models as an act at the boundary of corporate manipulation and crime.<\/p>\n<p data-attrs=\"{&quot;url&quot;:&quot;https:\/\/www.interconnects.ai\/p\/the-distillation-panic?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\" class=\"button-wrapper\"><a href=\"https:\/\/www.interconnects.ai\/p\/the-distillation-panic?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share\" class=\"button primary\" rel=\"nofollow noopener\" target=\"_blank\">Share<\/a><\/p>\n<p>I\u2019ve recently written a more <a href=\"https:\/\/www.interconnects.ai\/p\/how-much-does-distillation-really\" rel=\"nofollow noopener\" target=\"_blank\">technical piece<\/a> on estimating how impactful state-of-the-art distillation methods are on leading Chinese models, and this piece follows to push for caution in any hasty actions to target the methods with policy. To set the stage, recall Anthropic\u2019s recent blog post where they <a href=\"https:\/\/www.anthropic.com\/news\/detecting-and-preventing-distillation-attacks\" rel=\"nofollow noopener\" target=\"_blank\">detailed \u201cdistillation attacks\u201d made by 3 Chinese labs<\/a>.<\/p>\n<blockquote>\n<p>These labs used a technique called \u201cdistillation,\u201d which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.<\/p>\n<\/blockquote>\n<p>This is a clever paragraph, where they normalize distillation generally and explain how a few people can use it illicitly, without detailing how illicit use often involves other more explicit behavior like jailbreaking, hacking, or identity spoofing of the API.<\/p>\n<p>Distillation itself is an industry standard. It\u2019s used extensively, primarily in post-training, by smaller players to create specialized or smaller models. In my <a href=\"https:\/\/rlhfbook.com\/c\/12-synthetic-data\" rel=\"nofollow noopener\" target=\"_blank\">book<\/a> coming this summer, I describe it as follows:<\/p>\n<blockquote>\n<p>The term distillation has been the most powerful form of discussion around the role of synthetic data in language models. Distillation as a term comes from a technical definition of teacher-student knowledge distillation from the deep learning literature.<\/p>\n<p>Distillation colloquially refers to using the outputs from a stronger model to train a smaller model.<\/p>\n<p>In post-training, this general notion of distillation takes two common forms:<\/p>\n<ol>\n<li>\n<p>As a data engine to use across wide swaths of the post-training process: Completions for instructions, preference data (or Constitutional AI), or verification for RL.<\/p>\n<\/li>\n<li>\n<p>To transfer specific skills from a stronger model to a weaker model, which is often done for specific skills such as mathematical reasoning or coding.<\/p>\n<\/li>\n<\/ol>\n<\/blockquote>\n<p>With this definition, it\u2019s easy to see how distillation takes many forms. Of course, if you just take the outputs from GPT-5.5 and train a recent open-weight base model with them to host a competitive product, that\u2019s one thing. But, a lot of the things that fall under the bucket of distillation are complex, multi-stage processes that muddle the exact impact of the model you distilled from.<\/p>\n<p>Modern LLM processes could look like using a GPT API to build an initial batch of synthetic data to build a specialized small data-processing model. A good example is a model like olmOCR (or many other models in this category) that are trained to convert PDFs to clean text. This specialized model would be used to create large amounts of data. Finally, you train another model (often from scratch) with the new data you created. Is this final model distilled from GPT?<\/p>\n<p>When done via a closed, API-based model, distillation sits in the grey area of the terms of service that you agree to when signing up to the Claude or GPT platform. They generally forbid the use of the API to create competing language model products, but this term has largely gone unenforced. The open-source community used to worry deeply at being cut off from these cutting-edge APIs for doing research or creating public datasets, but to date only <a href=\"https:\/\/www.theverge.com\/2023\/12\/15\/24003151\/bytedance-china-openai-microsoft-competitor-llm\" rel=\"nofollow noopener\" target=\"_blank\">one prominent case of corporate accounts being restricted exists<\/a> (at least until the recent Chinese companies).<\/p>\n<p>This is all to say that distillation is an industry standard technique, and the use of closed APIs to perform distillation has always been a grey area. Nvidia\u2019s latest Nemotron models, as one of the only models with open post-training datasets, are technically in large part distilled from Chinese, open-weight models. The Olmo models we\u2019ve built at Ai2 are distilled from a mix of open and closed models. This grey area was brought to the forefront again when it turned out that xAI has been distilling from OpenAI. Quoting from the recent trial <a href=\"https:\/\/x.com\/MTSlive\/status\/2049886679876632724\" rel=\"nofollow\">proceedings<\/a> between Elon and OpenAI:<\/p>\n<blockquote>\n<p>OpenAI\u2019s counsel asked Musk whether xAI has ever \u201cdistilled\u201d technology from OpenAI.<\/p>\n<p>Musk: \u201cGenerally AI companies distill other AI companies.\u201d<\/p>\n<p>\u201cIs that a yes?\u201d Savitt asked.<\/p>\n<p>Musk: \u201cPartly.\u201d<\/p>\n<\/blockquote>\n<p>xAI is likely the largest, and most successful AI company willing to thread the grey area that is distillation from their competitors. On the other side, the majority of startups and research groups with fewer resources than them have very likely engaged in distillation of some capacity from Claude, GPT, or Gemini models.<\/p>\n<p>In the above Anthropic blog post, the problem with the distillation attacks by a few Chinese labs is less the distillation and more the means of attack. It is documented that Chinese labs are actively working to get around the intended use of the API, e.g. to provide additional reasoning data that is very useful for training.<\/p>\n<p>Of course no one should be able to access information from a model that a developer didn\u2019t intend to reveal in their APIs (e.g., reasoning traces which would be helpful for training). Associating all of distillation with these attacks, which is to date an industry standard for post-training, from open and closed models alike will be a massive own goal.<\/p>\n<p>What these few labs are doing should be referred to as jailbreaking or abuse, rather than distillation.<\/p>\n<p>The discourse around these actions is creating a troubling discussion that\u2019s marching towards a mix of regulatory capture or regulatory exuberance that\u2019s most likely to harm the U.S.\u2019s ecosystem more than China\u2019s. Even if we ban, most likely through potential legal action and other penalties, this type of API abuse, the Chinese companies will likely still do it. We\u2019ve seen this playbook with Chinese multimedia models taking a flexible view of copyrighted content that no U.S. player is willing to take the risk on.<\/p>\n<p>This distillation discussion has quickly snowballed, with a <a href=\"https:\/\/www.congress.gov\/bill\/119th-congress\/house-bill\/8283\/text\" rel=\"nofollow noopener\" target=\"_blank\">bill moving out of a committee in Congress<\/a>, an <a href=\"https:\/\/whitehouse.gov\/wp-content\/uploads\/2026\/04\/NSTM-4.pdf\" rel=\"nofollow noopener\" target=\"_blank\">executive order<\/a> pushing for action, and <a href=\"https:\/\/www.semafor.com\/article\/04\/29\/2026\/house-committee-probes-cursor-parent-airbnb-over-chinese-ai\" rel=\"nofollow noopener\" target=\"_blank\">congressional oversight<\/a> targeting U.S. companies building on Chinese models (which are downstream of distillation). This multi-pronged regulatory environment could yield truly horrible outcomes \u2013 such as figuring out a way to effectively ban open-weight models in the U.S. that are built in China by groups abusing closed LLM APIs.<\/p>\n<p>It is obvious that no bill will literally ban open models, but they can create grey area that exposes entities to unwanted risk or require certain provisions that are bureaucratically very challenging to fulfill, squashing small open source contributors.<\/p>\n<p>In that scenario, the groups who lose are Western academics and smaller companies building models for the long-tail of AI uses. The ecosystem here could be made permanently irrelevant with the removal of nearly all Chinese open-weight models. There is no immediate substitute and building new models with meaningful community adoption has a lead time measured in 6+ months. In the time it takes to build a new domestic open-source ecosystem, countless researchers would\u2019ve moved onto closed training platforms or into new areas.<\/p>\n<p>Altogether, I\u2019m hoping this flurry of discussion around distillation becomes a nothing-burger and not a hasty, multi-pronged policy push. We need to avoid two things:<\/p>\n<ol>\n<li>\n<p>A wholesale negative connotation of the word distillation, which is used extensively across the AI ecosystem.<\/p>\n<\/li>\n<li>\n<p>A domestic ban of the open-weight models built by organizations engaged in some portion of distillation.<\/p>\n<\/li>\n<\/ol>\n<p>In addition to this, I want the leading U.S. AI companies to be able to provide their APIs without having their IP leak. They should share more information on why it is hard for them to secure their APIs, but that\u2019s an issue out of scope for my expertise.<\/p>\n<p>I\u2019ll conclude with a proposal from my friend Kevin Xu at <a href=\"https:\/\/www.interconnectedcapital.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Interconnected Capital<\/a> (and great <a href=\"https:\/\/interconnect.substack.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Substack<\/a>) on why this current distillation dynamic may actually be good for the leading labs.<\/p>\n<p>If all the Chinese companies are addicted to distillation as a way of getting close to the frontier, then they\u2019ll never actually learn the techniques needed to take an outright lead. If we cut off the Chinese\u2019s obvious crutch in model building, we\u2019ll gain a short-term lead in AI, but in the long-term that may be what they needed to get on a more competitive long-term trajectory. <\/p>\n<p>This is the same debate we\u2019re having with other technologies where the U.S. currently has a lead, e.g. with advanced semiconductor technologies. So I understand the trade-offs, but we not should crack down on all of distillation.<\/p>\n","protected":false},"excerpt":{"rendered":"\u2018Distillation attacks\u2019 is a horrible term for what is happening right now. Yes, some Chinese labs are hacking&hellip;\n","protected":false},"author":2,"featured_media":468328,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[261],"tags":[291,289,290,18,19,17,82],"class_list":{"0":"post-468327","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-eire","12":"tag-ie","13":"tag-ireland","14":"tag-technology"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@ie\/116517796887957063","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/468327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=468327"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/468327\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/468328"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=468327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=468327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=468327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}