{"id":192995,"date":"2025-06-18T00:07:10","date_gmt":"2025-06-18T00:07:10","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/192995\/"},"modified":"2025-06-18T00:07:10","modified_gmt":"2025-06-18T00:07:10","slug":"biotech-firm-aims-to-create-chatgpt-of-biology-will-it-work","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/192995\/","title":{"rendered":"Biotech firm aims to create \u2018ChatGPT of biology\u2019 \u2013 will it work?"},"content":{"rendered":"<p><img decoding=\"async\" class=\"Image\" alt=\"\" width=\"1350\" height=\"900\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/06\/SEI_255742829.jpg\"   loading=\"eager\" fetchpriority=\"high\" data-image-context=\"Article\" data-image-id=\"2484687\" data-caption=\"Basecamp researchers gathering genetic data in Malta\" data-credit=\"Greg Funnell\"\/><\/p>\n<p class=\"ArticleImageCaption__Title\">Basecamp researchers gathering genetic data in Malta<\/p>\n<p class=\"ArticleImageCaption__Credit\">Greg Funnell<\/p>\n<\/p>\n<p>A British biotech firm called Basecamp Research has spent the past few years collecting troves of genetic data from microbes living in extreme environments around the world, identifying more than a million species and nearly 10 billion genes new to science. It claims that this massive database of the planet\u2019s biodiversity will help train a \u201cChatGPT of biology\u201d that will answer questions about life on Earth \u2013 but there\u2019s no guarantee this will work.<\/p>\n<p><a href=\"https:\/\/www.dsmz.de\/dsmz\/portrait\/management\" target=\"_blank\" rel=\"noopener\">J\u00f6rg Overmann<\/a> at the Leibniz Institute DSMZ in Germany, which houses one of the world\u2019s most diverse collections of microbial cultures, says increasing known genetic sequences is valuable, but may not result in useful findings for things like drug discovery or chemistry without more information about the organisms from which they were collected. \u201cI\u2019m not convinced that in the end the understanding of really novel functions will be accelerated by this brute-force increase in the sequence space,\u201d he says.<\/p>\n<p>Recent years have seen researchers develop a number of machine learning models trained to identify patterns and predict relationships amid vast amounts of biological data. The most famous of these is <a href=\"https:\/\/www.newscientist.com\/article\/2331479-alphafold-why-deepminds-protein-folding-ai-is-transformational\/\" target=\"_blank\" rel=\"noopener\">AlphaFold<\/a>, which can predict the 3D structure of a protein based only on genetic data, and earned its creators at Google DeepMind the 2024 Nobel prize in chemistry.<\/p>\n<p>While such \u201cgenerative biology\u201d models have grown ever more complex since, they haven\u2019t gotten much better, says <a href=\"https:\/\/www.francesding.com\/\" target=\"_blank\" rel=\"noopener\">Frances Ding<\/a> at the University of California, Berkeley. One reason could be a lack of biodiverse data. \u201cCurrent models in biology are trained on datasets that disproportionately represent well-studied species (e.g., E. coli, mice, humans), and these models are worse at predicting properties about sequences from other parts of the tree of life,\u201d she says.<\/p>\n<p>Researchers at Basecamp set out to address this biodiversity gap. The company\u2019s growing database now contains samples from more than 120 sites in 26 countries, according to a <a href=\"https:\/\/basecamp-research.com\/data\/\" target=\"_blank\" rel=\"noopener\">report<\/a> the company posted. <a href=\"https:\/\/www.linkedin.com\/in\/jonathan-john-finn-8b3bb23\" target=\"_blank\" rel=\"noopener\">Jonathan Finn<\/a>, the company\u2019s chief science officer, says the collection efforts focused on extreme environments that hadn\u2019t yet been widely sampled, ranging from the frigid water beneath Arctic sea ice to jungle hot springs. \u201cMost of the samples that we\u2019ve been going after are prokaryotic samples: bacteria, microbes and their viruses,\u201d says Finn. \u201cI know we\u2019ve got some fungi in there.\u201d<\/p>\n<p>Genetic analysis of these samples revealed differences in genes shared nearly universally across the tree of life \u2013 based on this, the company estimates the data contains information from more than 1 million species that don\u2019t occur in public genomic datasets used to train AI biology models. These collectively contain around 9.8 billion newly identified genes, a 10-fold increase in the total number of known genes, each of which encodes a potentially useful protein, the researchers say.<\/p>\n<p>\u201cBy showing these models a large piece of nature, they should have a better understanding of how biology works,\u201d says Finn. \u201cWe\u2019re trying to build a ChatGPT of biology.\u201d<\/p>\n<p>By some estimates, Earth hosts as <a href=\"https:\/\/www.pnas.org\/doi\/10.1073\/pnas.1521291113\" target=\"_blank\" rel=\"noopener\">many as a trillion microbial species<\/a>, almost none of which are well characterised. So, it\u2019s not hugely surprising the company identified so much new life. \u201cIt\u2019s almost inevitable that if you explore more you get more different gene variants,\u201d says <a href=\"https:\/\/www.sanger.ac.uk\/person\/parts-leopold\/\" target=\"_blank\" rel=\"noopener\">Leopold Parts<\/a> at the Wellcome Sanger Institute, UK.<\/p>\n<p>But Basecamp is banking on the idea that all the new material could be valuable \u2013 and it\u2019s not alone. \u201cThis is one of the most exciting things I\u2019ve seen in a long time,\u201d says <a href=\"https:\/\/www.gene.com\/scientists\/our-scientists\/nathan-frey\" target=\"_blank\" rel=\"noopener\">Nathan Frey<\/a>, a machine learning researcher at Genentech, a biotech firm in the US. In general, he says work on AI models for biology has focused on improving algorithms or generating more data in labs rather than actually going out in the world and collecting samples.<\/p>\n<p>However, there is reason to be sceptical that the database will lead to the radically improved models the company wants. For one, it remains unclear to what extent this new diversity of proteins represents valuable new functions, such as plastic-eating enzymes or proteins that could be repurposed for gene editing. \u201cThey have to show that this novelty is useful in some way,\u201d says Parts.<\/p>\n<p>Further, if the new genes really are substantially different from those we already know, Overmann doesn\u2019t see how existing tools can easily predict their functions, or how the data can be used for training a new model. \u201cYou don\u2019t have any clue what the majority of the genes do,\u201d he says. The company could well have assembled a treasure trove of new biology, but without more old-fashioned laboratory work to understand what\u2019s there it may remain mysterious, even to the most powerful AI.<\/p>\n<p class=\"ArticleTopics__Heading\">Topics:<\/p>\n","protected":false},"excerpt":{"rendered":"Basecamp researchers gathering genetic data in Malta Greg Funnell A British biotech firm called Basecamp Research has spent&hellip;\n","protected":false},"author":2,"featured_media":192996,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3846],"tags":[4675,267,11476,70,16,15],"class_list":{"0":"post-192995","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-biodiversity","9":"tag-genetics","10":"tag-microbiology","11":"tag-science","12":"tag-uk","13":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114701457532796420","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/192995","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=192995"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/192995\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/192996"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=192995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=192995"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=192995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}