{"id":289217,"date":"2025-10-09T12:26:17","date_gmt":"2025-10-09T12:26:17","guid":{"rendered":"https:\/\/www.europesays.com\/us\/289217\/"},"modified":"2025-10-09T12:26:17","modified_gmt":"2025-10-09T12:26:17","slug":"google-for-dna-brings-order-to-biologys-big-data","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/289217\/","title":{"rendered":"\u2018Google for DNA\u2019 brings order to biology\u2019s big data"},"content":{"rendered":"<p> <img decoding=\"async\" class=\"figure__image\" alt=\"Close up view of a researcher's hand interacting with a DNA sequence on a digital display.\" loading=\"lazy\" src=\"https:\/\/www.europesays.com\/us\/wp-content\/uploads\/2025\/10\/d41586-025-03219-w_51546810.jpg\"\/><\/p>\n<p class=\"figure__caption u-sans-serif\">MetaGraph indexes and vast archives of DNA, RNA, and protein sequences. Scientists can search the archives and trace biological contexts in big data.Credit: Andrew Brookes\/Connect Images\/Science Photo Library<\/p>\n<p>The Internet has Google. Now biology has MetaGraph. Detailed today in Nature<a href=\"#ref-CR1\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">1<\/a>, the search engine can quickly sift through the staggering volumes of biological <a href=\"https:\/\/www.nature.com\/articles\/d41586-024-03236-1\" data-track=\"click\" data-label=\"https:\/\/www.nature.com\/articles\/d41586-024-03236-1\" data-track-category=\"body text link\" target=\"_blank\" rel=\"noopener\">data housed in public repositories<\/a>.<\/p>\n<p>\u201cIt\u2019s a huge achievement,\u201d says Rayan Chikhi, a biocomputing researcher at the Pasteur Institute in Paris. \u201cThey set a new standard\u201d for analysing raw biological data \u2014 including DNA, RNA and protein sequences \u2014 from databases that can contain millions of billions of DNA letters, amounting to \u2018petabases\u2019 of information, more entries than all the webpages in Google\u2019s vast index.<\/p>\n<p>Although MetaGraph is tagged as \u2018Google for DNA\u2019, Chikhi likens the tool to a search engine for YouTube, because the tasks are more computationally demanding. In the same way that YouTube searches can retrieve every video that features, say, red balloons even when those key words don\u2019t appear in the title, tags or description, MetaGraph can uncover genetic patterns hidden deep within expansive sequencing data sets without needing those patterns to be explicitly annotated in advance.<\/p>\n<p>\u201cIt enables things that cannot be done in any other way,\u201d Chikhi says.<\/p>\n<p><a href=\"https:\/\/www.nature.com\/articles\/d41586-022-02826-1\" class=\"u-link-inherit\" data-track=\"click\" data-track-label=\"recommended article\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"recommended__image\" alt=\"\" src=\"https:\/\/www.europesays.com\/us\/wp-content\/uploads\/2025\/10\/d41586-025-03219-w_24038964.jpg\"\/><\/p>\n<p class=\"recommended__title u-serif\">Smart software untangles gene regulation in cells<\/p>\n<p><\/a><\/p>\n<p>Indexing life\u2019s library<\/p>\n<p>The motivation behind MetaGraph was to address an accessibility problem in <a href=\"https:\/\/www.nature.com\/immersive\/d42859-020-00002-x\/index.html\" data-track=\"click\" data-label=\"https:\/\/www.nature.com\/immersive\/d42859-020-00002-x\/index.html\" data-track-category=\"body text link\" target=\"_blank\" rel=\"noopener\">sequencing data sets<\/a>. The size of these repositories has risen at a blistering pace in the past few decades, but this growth has presented challenges for the scientists using the data they contain. Raw sequencing reads are fragmented, noisy and too numerous to search directly. \u201cThe volume of the data, paradoxically, is the main inhibitor of us actually using the data,\u201d says Babaian.<\/p>\n<p>According to the study author, Andr\u00e9 Kahles, a bioinformatician at the Swiss Federal Institute of Technology (ETH) Zurich in Switzerland, MetaGraph could help researchers to ask biological questions of repositories such as the Sequence Read Archive (<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/sra\" data-track=\"click\" data-label=\"https:\/\/www.ncbi.nlm.nih.gov\/sra\" data-track-category=\"body text link\" target=\"_blank\" rel=\"noopener\">SRA<\/a>), a public database containing in excess of 100 million billion DNA letters<a href=\"#ref-CR2\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">2<\/a><\/p>\n<p>They tackled the problem through the use of mathematical \u2018graphs\u2019 that links overlapping DNA fragments together, much like sentences that share the same words lining up in a book index.<\/p>\n<p>The researchers integrated data from seven publicly funded data repositories, creating 18.8\u2009million unique DNA and RNA sequence sets and 210\u2009billion amino-acid sequence sets across all clades of life \u2014 including viruses, bacteria, fungi, plants and animals, including humans. They also developed a search engine for these sequences, in which users use text prompts to search these integrated archives of raw data.<\/p>\n<p>\u201cIt is a totally new way to interact with this body of data,\u201d says Kahles. \u201cIt\u2019s compressed, but accessible on the fly.\u201d<\/p>\n<p><a href=\"https:\/\/www.nature.com\/articles\/d41586-024-03423-0\" class=\"u-link-inherit\" data-track=\"click\" data-track-label=\"recommended article\" target=\"_blank\" rel=\"noopener\"><\/p>\n<p class=\"recommended__title u-serif\">The huge protein database that spawned AlphaFold and biology\u2019s AI revolution<\/p>\n<p><\/a><\/p>\n<p>To demonstrate the utility of MetaGraph, the study authors used it to scan 241,384 human gut microbiome samples for genetic indicators of <a href=\"https:\/\/www.nature.com\/articles\/d41586-025-03218-x\" data-track=\"click\" data-label=\"https:\/\/www.nature.com\/articles\/d41586-025-03218-x\" data-track-category=\"body text link\" target=\"_blank\" rel=\"noopener\">antibiotic resistance around the world<\/a>, building on work that used an earlier version of the tool to track drug-resistance genes in bacterial strains that live in subway systems across major urban centres<a href=\"#ref-CR3\" data-track=\"click\" data-action=\"anchor-link\" data-track-label=\"go to reference\" data-track-category=\"references\">3<\/a>. The authors say they performed the analysis in about an hour on a high-powered computer. <\/p>\n<p>Open road to discovery<\/p>\n","protected":false},"excerpt":{"rendered":"MetaGraph indexes and vast archives of DNA, RNA, and protein sequences. Scientists can search the archives and trace&hellip;\n","protected":false},"author":3,"featured_media":289218,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[18531,33720,57953,10046,10047,159,67,132,68],"class_list":{"0":"post-289217","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-science","8":"tag-bioinformatics","9":"tag-databases","10":"tag-dna-sequencing","11":"tag-humanities-and-social-sciences","12":"tag-multidisciplinary","13":"tag-science","14":"tag-united-states","15":"tag-unitedstates","16":"tag-us"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/115344204965887174","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/289217","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=289217"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/289217\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/289218"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=289217"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=289217"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=289217"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}