{"id":155441,"date":"2025-06-03T18:17:12","date_gmt":"2025-06-03T18:17:12","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/155441\/"},"modified":"2025-06-03T18:17:12","modified_gmt":"2025-06-03T18:17:12","slug":"deepseek-may-have-used-googles-gemini-to-train-its-latest-model","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/155441\/","title":{"rendered":"DeepSeek may have used Google&#8217;s Gemini to train its latest model"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Last week, Chinese lab DeepSeek released an <a href=\"https:\/\/techcrunch.com\/2025\/05\/28\/deepseek-updates-its-r1-reasoning-ai-model-releases-it-on-hugging-face\/\" target=\"_blank\" rel=\"noopener\">updated version of its R1 reasoning AI model<\/a> that performs well on a number of math and coding benchmarks. The company didn\u2019t reveal the source of the data it used to train the model, but some AI researchers speculate that at least a portion came from Google\u2019s Gemini family of AI.<\/p>\n<p class=\"wp-block-paragraph\">Sam Paech, a Melbourne-based developer who creates \u201cemotional intelligence\u201d evaluations for AI, published what he claims is evidence that DeepSeek\u2019s latest model was trained on outputs from Gemini. DeepSeek\u2019s model, called R1-0528, prefers words and expressions similar to those that Google\u2019s Gemini 2.5 Pro favors, said Paech in an <a href=\"https:\/\/x.com\/sam_paech\/status\/1928187246689112197?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1928187246689112197%7Ctwgr%5Eeb145a37f745de501be925d1a9e8050a25863e82%7Ctwcon%5Es1_c10&amp;ref_url=https%3A%2F%2Ftechcrunch.com%2Fwp-admin%2Fpost.php%3Fpost%3D3014600action%3Dedit\" target=\"_blank\" rel=\"noreferrer noopener\">X post<\/a>.<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">If you&#8217;re wondering why new deepseek r1 sounds a bit different, I think they probably switched from training on synthetic openai to synthetic gemini outputs. <a rel=\"nofollow\" href=\"https:\/\/t.co\/Oex9roapNv\">pic.twitter.com\/Oex9roapNv<\/a><\/p>\n<p>\u2014 Sam Paech (@sam_paech) <a rel=\"nofollow noopener\" href=\"https:\/\/twitter.com\/sam_paech\/status\/1928187246689112197?ref_src=twsrc%5Etfw\" target=\"_blank\">May 29, 2025<\/a><\/p><\/blockquote>\n<p class=\"wp-block-paragraph\">That\u2019s not a smoking gun. But another developer, the pseudonymous creator of a \u201cfree speech eval\u201d for AI called <a href=\"https:\/\/techcrunch.com\/2025\/04\/16\/theres-now-a-benchmark-for-how-free-an-ai-chatbot-is-to-talk-about-controversial-topics\/\" target=\"_blank\" rel=\"noopener\">SpeechMap<\/a>, noted the DeepSeek model\u2019s traces \u2014 the \u201cthoughts\u201d the model generates as it works toward a conclusion \u2014 \u201cread like Gemini traces.\u201d<\/p>\n<p class=\"wp-block-paragraph\">DeepSeek has been accused of training on data from rival AI models before. In December, developers <a href=\"https:\/\/techcrunch.com\/2024\/12\/27\/why-deepseeks-new-ai-model-thinks-its-chatgpt\/\" target=\"_blank\" rel=\"noopener\">observed<\/a> that DeepSeek\u2019s V3 model often identified itself as ChatGPT, OpenAI\u2019s AI-powered chatbot platform, suggesting that it may\u2019ve been trained on ChatGPT chat logs.<\/p>\n<p class=\"wp-block-paragraph\">Earlier this year, <a href=\"https:\/\/www.ft.com\/content\/a0dfedd1-5255-4fa9-8ccc-1fe01de87ea6\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI told the\u00a0Financial Times<\/a> it found evidence linking DeepSeek to the use of distillation, a technique to train AI models by extracting data from bigger, more capable ones. <a href=\"https:\/\/www.bloomberg.com\/news\/articles\/2025-01-29\/microsoft-probing-if-deepseek-linked-group-improperly-obtained-openai-data\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">According to Bloomberg<\/a>, Microsoft, a close OpenAI collaborator and investor, detected that large amounts of data were being exfiltrated through OpenAI developer accounts in late 2024 \u2014 accounts OpenAI believes are affiliated with DeepSeek.<\/p>\n<p class=\"wp-block-paragraph\">Distillation isn\u2019t an uncommon practice, but OpenAI\u2019s terms of service prohibit customers from using the company\u2019s model outputs to build competing AI.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">To be clear, many models <a href=\"https:\/\/www.cnbc.com\/2023\/12\/29\/baidu-says-its-chatgpt-rival-ernie-bot-has-more-than-100-million-users.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">misidentify<\/a> <a href=\"https:\/\/www.reddit.com\/r\/ChatGPT\/comments\/1gslm0t\/gemini_models_answer_claude_when_asked_about_its\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">themselves<\/a> and converge on the same words and turns of phrases. That\u2019s because the open web, which is where AI companies source the bulk of their training data, is becoming\u00a0<a href=\"https:\/\/www.forbes.com.au\/news\/innovation\/is-ai-quietly-killing-itself-and-the-internet\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">littered<\/a>\u00a0with AI\u00a0<a href=\"https:\/\/www.niemanlab.org\/2022\/12\/im-sorry-but-im-a-large-language-model\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">slop<\/a>. Content farms are using AI to create\u00a0<a href=\"https:\/\/www.nytimes.com\/2023\/05\/19\/technology\/ai-generated-content-discovered-on-news-sites-content-farms-and-product-reviews.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">clickbait<\/a>, and bots are flooding\u00a0Reddit\u00a0and\u00a0<a rel=\"nofollow noopener\" href=\"https:\/\/www.theguardian.com\/technology\/2023\/sep\/09\/x-twitter-bots-republican-primary-debate-tweets-increase\" target=\"_blank\">X<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">This \u201ccontamination,\u201d if you will, has made it\u00a0<a href=\"https:\/\/x.com\/TheXeophon\/status\/1872582201919021516\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">quite difficult<\/a>\u00a0to thoroughly filter AI outputs from training datasets.<\/p>\n<p class=\"wp-block-paragraph\">Still, AI experts like Nathan Lambert, a researcher at the nonprofit AI research institute AI2, don\u2019t think it\u2019s out of the question that DeepSeek trained on data from Google\u2019s Gemini.<\/p>\n<p class=\"wp-block-paragraph\">\u201cIf I was DeepSeek, I would definitely create a ton of synthetic data from the best API model out there,\u201d Lambert <a href=\"https:\/\/x.com\/natolambert\/status\/1929895008435306823\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">wrote<\/a> in a post on X. \u201c[DeepSeek is] short on GPUs and flush with cash. It\u2019s literally effectively more compute for them.\u201d<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">If I was DeepSeek I would definitely create a ton of synthetic data from the best API model out there. Theyre short on GPUs and flush with cash. It\u2019s literally effectively more compute for them. yes on the Gemini distill question.<\/p>\n<p>\u2014 Nathan Lambert (@natolambert) <a rel=\"nofollow noopener\" href=\"https:\/\/twitter.com\/natolambert\/status\/1929895008435306823?ref_src=twsrc%5Etfw\" target=\"_blank\">June 3, 2025<\/a><\/p><\/blockquote>\n<p class=\"wp-block-paragraph\">Partly in an effort to prevent distillation, AI companies have been ramping up security measures. <\/p>\n<p class=\"wp-block-paragraph\">In April, OpenAI began <a href=\"https:\/\/techcrunch.com\/2025\/04\/13\/access-to-future-ai-models-in-openais-api-may-require-a-verified-id\/\" target=\"_blank\" rel=\"noopener\">requiring<\/a> organizations to complete an ID verification process in order to access certain advanced models. The process requires a government-issued ID from one of the countries supported by OpenAI\u2019s API; China isn\u2019t on the list.<\/p>\n<p class=\"wp-block-paragraph\">Elsewhere, Google recently began \u201csummarizing\u201d the traces generated by models available through its AI Studio developer platform, a step that makes it more challenging to train performant rival models on Gemini traces. Anthropic in May said it would <a href=\"https:\/\/techcrunch.com\/2025\/05\/22\/anthropics-new-claude-4-ai-models-can-reason-over-many-steps\/\" target=\"_blank\" rel=\"noopener\">start to summarize<\/a> its own model\u2019s traces, citing a need to protect its \u201ccompetitive advantages.\u201d<\/p>\n<p class=\"wp-block-paragraph\">We\u2019ve reached out to Google for comment and will update this piece if we hear back.<\/p>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"Last week, Chinese lab DeepSeek released an updated version of its R1 reasoning AI model that performs well&hellip;\n","protected":false},"author":2,"featured_media":155442,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,3650,2332,53,16,15],"class_list":{"0":"post-155441","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-deepseek","11":"tag-gemini","12":"tag-technology","13":"tag-uk","14":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114620809991036621","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/155441","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=155441"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/155441\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/155442"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=155441"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=155441"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=155441"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}