{"id":14198,"date":"2025-04-12T16:30:28","date_gmt":"2025-04-12T16:30:28","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/14198\/"},"modified":"2025-04-12T16:30:28","modified_gmt":"2025-04-12T16:30:28","slug":"a-redefined-indel-taxonomy-provides-insights-into-mutational-signatures","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/14198\/","title":{"rendered":"A redefined InDel taxonomy provides insights into mutational signatures"},"content":{"rendered":"<p>Diversity of InDel patterns in PRRd<\/p>\n<p>We generated a \u2018ground truth\u2019 set of isogenic cellular models by introducing CRISPR edits to key PRRd-associated genes in an hTERT-immortalized RPE1 (TP53\u2212\/\u2212) cell line<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Zimmermann, M. et al. CRISPR screens identify genomic ribonucleotides as a source of PARP-trapping lesions. Nature 559, 285&#x2013;289 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR23\" id=\"ref-link-section-d131317958e703\" target=\"_blank\" rel=\"noopener\">23<\/a>. We created four single MMR gene knockouts (\u0394MLH1, \u0394MSH2, \u0394MSH3 and \u0394SETD2 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Li, F. et al. The histone mark H3K36me3 regulates human DNA mismatch repair through its interaction with MutSalpha. Cell 153, 590&#x2013;600 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR24\" id=\"ref-link-section-d131317958e720\" target=\"_blank\" rel=\"noopener\">24<\/a>)), two knock-in missense mutants of polymerase Pol \u03b5 (POLE exonuclease mutant p.P286R and p.L424V), two mutants of Pol \u03b4 (POLD1 exonuclease mutant p.S478N and polymerase mutant p.R689W) and three double mutants with combined polymerase proofreading mutation and MMRd (POLD1S478N\/+\u0394MLH1, POLD1S478N\/+\u0394MSH2 and POLEP286R\u0394MSH2; Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">1<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">2<\/a>). Successfully edited clones were propagated in culture for approximately 45\u201350 days to permit mutation accumulation. Subsequently, two to five daughter subclones were isolated per genotype for whole-genome sequencing (WGS) and mutational signature analyses (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1a<\/a>).<\/p>\n<p><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 1: Isogenic PRRd human cell lines exhibit distinct InDel patterns.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02152-y\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41588_2025_2152_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"640\"\/><\/a><\/p>\n<p><b>a<\/b>, Mutation accumulation experiment in TP53-null hTERT-immortalized retinal pigment epithelial cell (hTERT-RPE1TP53-null, herewith referred to as the background control). <b>b<\/b>, InDel burden and average InDel fold increase of CRISPR gene edits (n\u2009=\u20092\u20135 subclones per genotype; Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">1<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">3<\/a>). Red dashed line represents the mean InDel burden of control subclones. The y axis shows the InDel burden in log scale. <b>c<\/b>, Distinguishing COSMIC-83 InDel profiles of edited subclones from background control. Light blue error bars depict the mean\u2009\u00b13\u2009s.d. of cosine similarities between n\u2009=\u2009100 bootstrapped InDel profiles of unedited controls and the background profile (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig5\" target=\"_blank\" rel=\"noopener\">1d<\/a>) aggregated from n\u2009=\u20097 unedited subclones. The x axis shows the InDel count in log scale. <b>d<\/b>, COSMIC-83 InDel mutational signatures associated with gene edits following background subtraction (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">4<\/a>). <b>e<\/b>, Key features of COSMIC ID1, ID2 and ID7 (v.3.3). <b>f<\/b>, Heatmap of cosine similarities between gene-edit IDS and COSMIC IDS (v.3.3). Known, proposed etiologies are annotated above the heatmap (blue). <b>g<\/b>, Decomposed solution of gene-edit InDel signatures in <b>d<\/b> into COSMIC IDS (v.3.3).<\/p>\n<p>Except for \u0394SETD2, we observed elevated InDel burdens in all gene edits compared to an unedited control (background) (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1b<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">3<\/a>). The mutation burden was approximately twofold higher in \u0394MSH3 and POLD1R689W, tenfold in POLD1S478N, POLEL424V and POLEP286R, 55-fold in \u0394MSH2 and \u0394MLH1 and particularly significant in combined gene edits\u2014around 200-fold in POLD1S478N\/+\u0394MLH1 and POLEP286R\u0394MSH2, and 300-fold in POLD1S478N\/+\u0394MSH2.<\/p>\n<p>All lines except \u0394SETD2 showed variations in their COSMIC-83 InDel signature profiles compared to control (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1c<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">4<\/a>). We noted discriminative characteristics between gene edits (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1d<\/a> and Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig5\" target=\"_blank\" rel=\"noopener\">1<\/a>). Dominant 1\u2009bp T deletions at homopolymers of 6\u2009bp or more (poly-T6+) were observed for \u0394MLH1, \u0394MSH2 and \u0394MSH3, while POLD1S478N and POLEP286R showed exclusive 1\u2009bp T insertions at poly-T5+. POLD1R689W, POLEL424V and all three combined polymerase\/MMRd edits predominantly exhibited 1\u2009bp T insertions at long homopolymers, although not exclusively, with variations of 1\u2009bp T deletions between different genotypes. Together, these experiments revealed unique, diverse InDel signatures among different PRRd mutants. Remarkably, mutations within the same gene but affecting different functional protein domains manifested signature variations (that is, POLD1 exonuclease p.S478N versus polymerase p.R689W).<\/p>\n<p>We also examined the substitution profiles of all gene edits (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig6\" target=\"_blank\" rel=\"noopener\">2<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">5<\/a>). Intriguingly, MMRd lines showed lower substitution-to-InDel ratios compared to control, while polymerase-dysfunction (Pol-dys) lines exhibited markedly increased ratios (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig6\" target=\"_blank\" rel=\"noopener\">2h<\/a>). This suggests that genome instability is predominantly driven by an excess of InDel mutagenesis in MMRd, whereas substitution mutagenesis plays a more significant role in polymerase proofreading dysfunction. Furthermore, mutational asymmetry analyses revealed enrichment of both substitutions and InDels on the leading strand for POLE mutants while POLD1 mutants exhibited lagging strand bias, specifically T insertions at homopolymeric tracts of 5\u20137 nts (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM1\" target=\"_blank\" rel=\"noopener\">1<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">6<\/a>). This is in keeping with the hypothesized preferential activity of Pol \u03b5 and Pol \u03b4 in leading and lagging strand synthesis, respectively<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\" title=\"Pursell, Z. F., Isoz, I., Lundstrom, E. B., Johansson, E. &amp; Kunkel, T. A. Yeast DNA polymerase epsilon participates in leading-strand DNA replication. Science 317, 127&#x2013;130 (2007).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR25\" id=\"ref-link-section-d131317958e962\" target=\"_blank\" rel=\"noopener\">25<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Larrea, A. A. et al. Genome-wide model for the normal eukaryotic DNA replication fork. Proc. Natl Acad. Sci. USA 107, 17674&#x2013;17679 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR26\" id=\"ref-link-section-d131317958e965\" target=\"_blank\" rel=\"noopener\">26<\/a>, suggesting that POLE\/POLD1 mutants tend to accumulate 1\u2009bp A insertions on the nascent strand while replicating through 5\u20137 nts poly-T-tracts<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Korona, D. A., Lecompte, K. G. &amp; Pursell, Z. F. The high fidelity and unique error signature of human DNA polymerase epsilon. Nucleic Acids Res. 39, 1763&#x2013;1773 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR27\" id=\"ref-link-section-d131317958e975\" target=\"_blank\" rel=\"noopener\">27<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Herzog, M. et al. Mutagenic mechanisms of cancer-associated DNA polymerase &#x3F5; alleles. Nucleic Acids Res. 49, 3919&#x2013;3931 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR28\" id=\"ref-link-section-d131317958e978\" target=\"_blank\" rel=\"noopener\">28<\/a>. This lends support to the proposition that polymerase \u03b5 and \u03b4 are more proficient at detecting incorrectly paired bases at template adenines<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Korona, D. A., Lecompte, K. G. &amp; Pursell, Z. F. The high fidelity and unique error signature of human DNA polymerase epsilon. Nucleic Acids Res. 39, 1763&#x2013;1773 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR27\" id=\"ref-link-section-d131317958e983\" target=\"_blank\" rel=\"noopener\">27<\/a>.<\/p>\n<p>Nevertheless, while the diversity of experimental InDel profiles was appreciable among PRRd genotypes, it was difficult to disambiguate gene-edit signatures from background mutagenesis. Clustering analyses and direct comparisons of gene edit and control InDel profiles showed extremely high similarity (cosine similarity\u2009&gt;\u20090.9; Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig5\" target=\"_blank\" rel=\"noopener\">1a,b,d<\/a>). Discrimination between MMRd and Pol-dys signatures was also limited (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig5\" target=\"_blank\" rel=\"noopener\">1e<\/a>). Unsupervised clustering using cosine distance revealed mainly two groups of signatures\u2014deletion-driven MMRd signatures and insertion-driven polymerase mutant signatures (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig5\" target=\"_blank\" rel=\"noopener\">1f<\/a>). We thus investigated the sufficiency of COSMIC-83 taxonomy, given that signal variation among the ten gene edits was primarily observed in two channels\u20141\u2009bp T insertions at poly-T5+ and 1\u2009bp T deletions at poly-T6+.<\/p>\n<p>Limitations of current InDel taxonomy<\/p>\n<p>We compared experimental gene-edit InDel signatures with COSMIC IDS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94&#x2013;101 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR7\" id=\"ref-link-section-d131317958e1012\" target=\"_blank\" rel=\"noopener\">7<\/a>. InDel signatures of \u0394MSH2 and \u0394MLH1 showed no similarity to the purported MMRd-associated ID7 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1f<\/a>). Instead, \u0394MSH2 and \u0394MLH1 signatures most resembled ID1 and ID2, ascribed to normal replication errors associated with nascent and template strand slippage, respectively (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1d\u2013g<\/a>).<\/p>\n<p>The COSMIC-83 taxonomy aggregates 1\u2009bp InDels at homopolymers &gt;5\u2009bp into single channels (that is, T6+ for deletions and T5+ for insertions, respectively; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1e<\/a> and Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig7\" target=\"_blank\" rel=\"noopener\">3a\u2013d<\/a>). Yet, the probability of microInDel formation increasing with the length of simple nucleotide repeats in MMRd<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Lujan, S. A., Clark, A. B. &amp; Kunkel, T. A. Differences in genome-wide repeat sequence instability conferred by proofreading and mismatch repair defects. Nucleic Acids Res. 43, 4067&#x2013;4074 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR29\" id=\"ref-link-section-d131317958e1048\" target=\"_blank\" rel=\"noopener\">29<\/a> is a recognized hallmark of MSI<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Strand, M., Prolla, T. A., Liskay, R. M. &amp; Petes, T. D. Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair. Nature 365, 274&#x2013;276 (1993).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR30\" id=\"ref-link-section-d131317958e1053\" target=\"_blank\" rel=\"noopener\">30<\/a>. We surmised that the conflation of discriminatory signals within longer homopolymers into single \u2018insertion at T5+\u2019 or \u2018deletion at T6+\u2019 channel likely reduces the separative capacity for signature extraction. Hence, MMRd signatures cannot be distinguished from signatures of normal replication errors. This contrasts with corresponding PRRd-associated substitution signatures, which manifest as distinct and diverse patterns amongst MMRd and\/or Pol-dys cancers<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, science.abl9283 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR3\" id=\"ref-link-section-d131317958e1061\" target=\"_blank\" rel=\"noopener\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94&#x2013;101 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR7\" id=\"ref-link-section-d131317958e1064\" target=\"_blank\" rel=\"noopener\">7<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Haradhvala, N. J. et al. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR31\" id=\"ref-link-section-d131317958e1067\" target=\"_blank\" rel=\"noopener\">31<\/a> (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig6\" target=\"_blank\" rel=\"noopener\">2b\u2013g<\/a>).<\/p>\n<p>Notably, ID7 lacks signal within reputedly the most informative homopolymer channel (&gt;5\u2009bp). Instead, signals are only present in channels associated with ID1 and ID2 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1e<\/a>), resulting in systematic misattribution of all MMRd gene-edit signatures to ID1 and ID2 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1f,g<\/a>). Moreover, InDel signatures of POLE, POLD1 mutants and all combined polymerase\/MMRd edits were indistinguishable from ID1 using COSMIC-83 taxonomy and sometimes indistinguishable from each other (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig5\" target=\"_blank\" rel=\"noopener\">1e<\/a>). The signature of polymerase mutant POLD1R689W did not resemble any reported signatures. Because InDel mutagenesis of gene edits occurred predominantly at longer homopolymers and was erroneously assigned to ID1 and\/or ID2 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1g<\/a>), we explored whether expanding on the long homopolymer channels and modifying the information presented in individual InDel channels could improve the resolution to disentangle the ostensibly alike but distinct biological signatures without compromising the power for signature extraction.<\/p>\n<p>A new framework for classifying InDels<\/p>\n<p>As with substitutions, incorporating surrounding sequence characteristics may enhance the discriminatory capacity of InDel catalogs for signature analyses<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Tanay, A. &amp; Siggia, E. D. Sequence context affects the rate of short insertions and deletions in flies and primates. Genome Biol. 9, R37 (2008).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR32\" id=\"ref-link-section-d131317958e1105\" target=\"_blank\" rel=\"noopener\">32<\/a>. We first classified InDels according to whether they were insertions, deletions or complex InDels (simultaneous insertions and deletions; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig2\" target=\"_blank\" rel=\"noopener\">2a<\/a>). Within insertions and deletions, InDels were subclassified by motif size (1\u2009bp versus \u22652\u2009bp). For 1\u2009bp InDels, we considered the nucleotide content (C\/G versus A\/T motifs), the 5\u2032 and 3\u2032 flanking bases and the length of homopolymeric tracts. For InDels \u22652\u2009bp, we identified the maximally repetitive motif within the InDel and accounted for its repeat length in the 3\u2032 sequence (Supplementary Note <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM1\" target=\"_blank\" rel=\"noopener\">1<\/a>). For deletions occurring with microhomology at the InDel junction, we considered the deletion motif length (L) and the microhomology length (M). This comprehensive taxonomy yielded 476 nonoverlapping InDel subcategories (channels; Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">7<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM1\" target=\"_blank\" rel=\"noopener\">Supplementary Note<\/a>).<\/p>\n<p><b id=\"Fig2\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 2: Redefined InDel taxonomy improves discriminatory power and reveals differential InDel patterns associated with PRR gene edits.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02152-y\/figures\/2\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig2\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41588_2025_2152_Fig2_HTML.png\" alt=\"figure 2\" loading=\"lazy\" width=\"685\" height=\"660\"\/><\/a><\/p>\n<p><b>a<\/b>, Proposed InDel classification schema and an example 89-channel InDel profile of an \u0394MSH2 subclone. <b>b<\/b>, Distinguishing 89-channel InDel profiles of edited subclones from background control. Light blue error bars depict the mean\u2009\u00b1\u20093\u2009s.d. of cosine similarities between n\u2009=\u2009100 bootstrapped InDel profiles of unedited controls and the background profile (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig9\" target=\"_blank\" rel=\"noopener\">5b<\/a>) aggregated from n\u2009=\u20097 unedited subclones. The x axis shows the InDel count in log scale. <b>c<\/b>, Cosine similarities of edited subclones and bootstrapped controls in COSMIC-83 InDel profiles against 89-channel InDel profiles. Two-tailed Wilcoxon signed-rank test, P\u2009=\u20091.917\u2009\u00d7\u200910\u22127). <b>d<\/b>, The 89-channel InDel mutational signatures associated with PRRd gene edits following background subtraction (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">4<\/a>; <a href=\"https:\/\/signal.mutationalsignatures.com\/explore\/main\/experimental\/experiments?study=7\" target=\"_blank\" rel=\"noopener\">https:\/\/signal.mutationalsignatures.com\/explore\/main\/experimental\/experiments?study=7<\/a>). Ins, insertion; Del, deletion.<\/p>\n<p>We examined whether all 476 channels were informative. By analyzing the InDel distribution across all channels in 18,522 tumors covering most cancer types from the International Cancer Genome Consortium (ICGC)\/The Cancer Genome Atlas (TCGA)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"The ICGC\/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82&#x2013;93 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR33\" id=\"ref-link-section-d131317958e1194\" target=\"_blank\" rel=\"noopener\">33<\/a>, Hartwig<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210&#x2013;216 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR34\" id=\"ref-link-section-d131317958e1198\" target=\"_blank\" rel=\"noopener\">34<\/a> and the Genomics England (GEL) 100,000 Genomes Project<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Turnbull, C. Introducing whole-genome sequencing into routine cancer care: the Genomics England 100 000 Genomes Project. Ann. Oncol. 29, 784&#x2013;787 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR35\" id=\"ref-link-section-d131317958e1202\" target=\"_blank\" rel=\"noopener\">35<\/a> (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig7\" target=\"_blank\" rel=\"noopener\">3e<\/a>), we identified noninformative channels (that is, channels with no signal) and consolidated those with low signal to reduce the total number of InDel channels to 89 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig2\" target=\"_blank\" rel=\"noopener\">2a<\/a>, Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig7\" target=\"_blank\" rel=\"noopener\">3f<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">8<\/a>). Overall, compared to COSMIC-83, the 89-channel taxonomy expands upon channels that had most of the signals, here 1\u2009bp A\/T InDels, into a larger array of channels, and condenses longer InDels and\/or genome motifs infrequent in the genome (where signals were scant or nonexistent) into fewer InDel subcategories (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig8\" target=\"_blank\" rel=\"noopener\">4<\/a>). Although the final numbers are not vastly different between the two classification systems, our data-driven approach, incorporating sequence contexts and enhancing signal distribution of mononucleotide\/polynucleotide repeat tracts into additional channels, provides alternative information to the mutational signature extraction and assignment process, potentially increasing the likelihood of detecting new biologically meaningful signatures.<\/p>\n<p>To test this, we applied the new 89-channel InDel taxonomy to our ground truth gene-edit dataset (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">4<\/a>). Cosine similarities between experimental InDel profiles and control were much lower with the 89-channel format than with COSMIC-83 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig2\" target=\"_blank\" rel=\"noopener\">2b,c<\/a> and Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig9\" target=\"_blank\" rel=\"noopener\">5a,b<\/a>), indicating that the new classification improved separation of gene edits from the background (mean cosine similarity, 0.68\u2009\u00b1\u20090.08 for 89-channel versus 0.89\u2009\u00b1\u20090.11 for COSMIC-83; two-tailed Wilcoxon signed-rank test, P\u2009=\u20091.917\u2009\u00d7\u200910\u22127). We subsequently determined signatures associated with each gene edit using the 89-channel format. The resultant signatures showed more evenly distributed signals across the entire 89-channel profile (Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig1\" target=\"_blank\" rel=\"noopener\">1d<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig2\" target=\"_blank\" rel=\"noopener\">2d<\/a> and Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig9\" target=\"_blank\" rel=\"noopener\">5c<\/a>). Gene-edit signatures were also more readily discernible from one another (mean signature pairwise cosine similarity, 0.57\u2009\u00b1\u20090.25 for 89-channel versus 0.64\u2009\u00b1\u20090.3 for COSMIC-83; two-tailed Wilcoxon signed-rank test, P\u2009=\u20091.483\u2009\u00d7\u200910\u22125; Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig9\" target=\"_blank\" rel=\"noopener\">5d,e<\/a>). Notably, InDel signatures of combined MMRd\/polymerase mutants were not simply the sum of the individual mutational processes, likely reflecting the biological interactions of Pol \u03b5 and Pol \u03b4 with MMR in suppressing InDel formation during the replication of repetitive DNA.<\/p>\n<p>Interestingly, we noted that while MMRd deletions are particularly amplified at longer homopolymers (8\u20139\u2009bp\u2009&gt;\u20095\u20137\u2009bp\u2009&gt;\u20090\u20134\u2009bp), polymerase mutants displayed strikingly different distribution of excess insertion mutagenesis at shorter homopolymers (5\u20137\u2009bp\u2009&gt;\u20098\u20139\u2009bp\u2009&gt;\u20090\u20134\u2009bp; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig2\" target=\"_blank\" rel=\"noopener\">2d<\/a>). The higher InDel rates in shorter homopolymers conferred by defective proofreading of Pol \u03b5 and Pol \u03b4 likely reflect the distance over which they interact with duplex DNA upstream of the polymerase active site<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Lujan, S. A., Clark, A. B. &amp; Kunkel, T. A. Differences in genome-wide repeat sequence instability conferred by proofreading and mismatch repair defects. Nucleic Acids Res. 43, 4067&#x2013;4074 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR29\" id=\"ref-link-section-d131317958e1263\" target=\"_blank\" rel=\"noopener\">29<\/a>. Indeed, crystal structures of Pol \u03b5 and Pol \u03b4 have shown numerous contacts made within 5\u20137\u2009bp of the polymerase active sites with duplex DNA<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Hogg, M. et al. Structural basis for processive DNA synthesis by yeast DNA polymerase varepsilon. Nat. Struct. Mol. Biol. 21, 49&#x2013;55 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR36\" id=\"ref-link-section-d131317958e1267\" target=\"_blank\" rel=\"noopener\">36<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Swan, M. K., Johnson, R. E., Prakash, L., Prakash, S. &amp; Aggarwal, A. K. Structural basis of high-fidelity DNA synthesis by yeast DNA polymerase delta. Nat. Struct. Mol. Biol. 16, 979&#x2013;986 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR37\" id=\"ref-link-section-d131317958e1270\" target=\"_blank\" rel=\"noopener\">37<\/a>, with experimental model reinforcing this optimal distance<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Lujan, S. A., Clark, A. B. &amp; Kunkel, T. A. Differences in genome-wide repeat sequence instability conferred by proofreading and mismatch repair defects. Nucleic Acids Res. 43, 4067&#x2013;4074 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR29\" id=\"ref-link-section-d131317958e1274\" target=\"_blank\" rel=\"noopener\">29<\/a>, explaining how proofreading may offer reduced protection against InDels outside of this \u2018footprint\u2019 (that is, unpaired bases further upstream of the active site<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Kroutil, L. C., Register, K., Bebenek, K. &amp; Kunkel, T. A. Exonucleolytic proofreading during replication of repetitive DNA. Biochemistry 35, 1046&#x2013;1053 (1996).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR38\" id=\"ref-link-section-d131317958e1278\" target=\"_blank\" rel=\"noopener\">38<\/a>; longer runs where MMR plays a more crucial role<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Lujan, S. A., Clark, A. B. &amp; Kunkel, T. A. Differences in genome-wide repeat sequence instability conferred by proofreading and mismatch repair defects. Nucleic Acids Res. 43, 4067&#x2013;4074 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR29\" id=\"ref-link-section-d131317958e1283\" target=\"_blank\" rel=\"noopener\">29<\/a>). These unique insights were only appreciable due to the new 89-channel format, offering enhanced capturing of biological variation.<\/p>\n<p>To compare the discriminatory capacity of both classification systems, we also performed de novo signature extraction on our ground truth experimental dataset (n\u2009=\u200937; Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig10\" target=\"_blank\" rel=\"noopener\">6a<\/a>). With COSMIC-83, only two de novo signatures were extracted\u2014one dominated by T insertions at poly-T5+ (ID83A) and the other by T deletions at poly-T6+ (ID83B; Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig10\" target=\"_blank\" rel=\"noopener\">6b<\/a>). In contrast, the 89-channel format yielded four signatures, matching our expectation of a predominantly deletion-driven MMRd signature (InD89B), a predominantly insertion-driven polymerase signature (InD89D) and two distinct signatures with differing proportions of InDels (InD89A and InD89C), likely reflecting the combined polymerase\/MMRd phenotypes (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig10\" target=\"_blank\" rel=\"noopener\">6c<\/a>).<\/p>\n<p>Finally, to determine whether this observed relationship between channel information content and signature extraction extended to other datasets and workflows, we applied three different algorithms<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, science.abl9283 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR3\" id=\"ref-link-section-d131317958e1311\" target=\"_blank\" rel=\"noopener\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"Jin, H. et al. Accurate and sensitive mutational signature analysis with MuSiCal. Nat. Genet. 56, 541&#x2013;552 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR8\" id=\"ref-link-section-d131317958e1314\" target=\"_blank\" rel=\"noopener\">8<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. 2, 100179 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR39\" id=\"ref-link-section-d131317958e1317\" target=\"_blank\" rel=\"noopener\">39<\/a> to an unrelated cohort of 52 colorectal WGS from ICGC<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"The ICGC\/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82&#x2013;93 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR33\" id=\"ref-link-section-d131317958e1321\" target=\"_blank\" rel=\"noopener\">33<\/a> (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig10\" target=\"_blank\" rel=\"noopener\">6d<\/a>). All three algorithms failed to discern all available signatures using COSMIC-83, reaching a discrimination limit of five, yielding sparse signatures with signal density highly concentrated in two channels (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig10\" target=\"_blank\" rel=\"noopener\">6d,e,g<\/a>). Contrarily, the 89-channel format consistently enabled the detection of more de novo signatures across all algorithms used (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig10\" target=\"_blank\" rel=\"noopener\">6f,h<\/a>). The extracted signatures also displayed signals across more channels, highlighting the superior performance of the 89-channel classification over COSMIC-83 in uncovering additional, true mutational processes.<\/p>\n<p>New InDel signatures (InDs) in seven cancer types<\/p>\n<p>To explore the impact of our new InDel taxonomy on signature discovery beyond PRRd phenotypes in human cancers, we analyzed seven tumor types (n\u2009=\u20094,775) known to display clinically relevant high tumor mutational burden (TMB) due to a range of abnormalities (for example, MMRd, environmental ultraviolet (UV) radiation, APOBEC-related mutagenesis)\u2014bladder (n\u2009=\u2009347), brain (CNS, n\u2009=\u2009392), colorectal (n\u2009=\u20092,146), endometrial (n\u2009=\u2009695), lung (n\u2009=\u2009958), stomach (n\u2009=\u2009181) and skin (n\u2009=\u200956) cancers from the GEL 100,000 Genomes Project<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Turnbull, C. Introducing whole-genome sequencing into routine cancer care: the Genomics England 100 000 Genomes Project. Ann. Oncol. 29, 784&#x2013;787 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR35\" id=\"ref-link-section-d131317958e1367\" target=\"_blank\" rel=\"noopener\">35<\/a> (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig3\" target=\"_blank\" rel=\"noopener\">3a<\/a>).<\/p>\n<p><b id=\"Fig3\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 3: De novo signature extraction using redefined InDel taxonomy uncovers 37 InDS in seven cancer types in the GEL cohort.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02152-y\/figures\/3\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig3\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41588_2025_2152_Fig3_HTML.png\" alt=\"figure 3\" loading=\"lazy\" width=\"685\" height=\"651\"\/><\/a><\/p>\n<p><b>a<\/b>, InDel burden across seven cancer types (n\u2009=\u20094,775; left) and the number of mutations contributed by each InD to the GEL tumors. The size of each dot represents the proportion of samples of each tumor type that shows the mutational signature. The color of each dot represents the median mutation burden (per Mb) of the signature in samples that show the signature. <b>b<\/b>, Profiles of 37 consensus InDel mutational signatures (InDS) extracted and curated from seven GEL cancer cohorts (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">10<\/a>; <a href=\"https:\/\/signal.mutationalsignatures.com\/explore\/main\/cancer\/signatures?mutationType=3&amp;study=7\" target=\"_blank\" rel=\"noopener\">https:\/\/signal.mutationalsignatures.com\/explore\/main\/cancer\/signatures?mutationType=3&amp;study=7<\/a>). Putative etiologies are provided in the top-left squircles. N-Slip, nascent strand slippage; T-Slip, template strand slippage; NHEJ, nonhomologous end joining.<\/p>\n<p>We performed mutational signature analysis per tumor type as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, science.abl9283 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR3\" id=\"ref-link-section-d131317958e1415\" target=\"_blank\" rel=\"noopener\">3<\/a> (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig3\" target=\"_blank\" rel=\"noopener\">3a<\/a>, Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig11\" target=\"_blank\" rel=\"noopener\">7<\/a>, and Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">9<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">11<\/a>; <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Sec9\" target=\"_blank\" rel=\"noopener\">Methods<\/a>). We identified 37 consensus InDel signatures, referred to as InDS (to distinguish from COSMIC IDS; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig3\" target=\"_blank\" rel=\"noopener\">3b<\/a>). Ten signatures shared characteristics mappable to known IDS (InD1, InD2a, InD3a\/InD3b, InD4a, InD6, InD8, InD9a, InD13 and InD18)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94&#x2013;101 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR7\" id=\"ref-link-section-d131317958e1438\" target=\"_blank\" rel=\"noopener\">7<\/a>. The remaining 27 were new.<\/p>\n<p>Exogenous exposures underlie five InDS. InD3a and InD3b often co-occurred in lung cancers with tobacco exposure. InD3a\/InD3b clustered with experimental signatures induced by benzo(a)pyrene and its metabolite benzo(a)pyrene diol epoxide (Extended Data Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig12\" target=\"_blank\" rel=\"noopener\">8<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig13\" target=\"_blank\" rel=\"noopener\">9<\/a>), supporting the notion that they represent modulated versions of tobacco-related DNA damage. InD13, characterized by T deletions at TT dinucleotides, is linked to UV damage, and InD18, found exclusively in colorectal samples, is due to colibactin exposure<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\" title=\"Pleguezuelos-Manzano, C. et al. Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli. Nature 580, 269&#x2013;273 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR40\" id=\"ref-link-section-d131317958e1451\" target=\"_blank\" rel=\"noopener\">40<\/a>. InD32 was identified in samples with prior exposure to platinum and was associated with a new platinum-associated signature, SBS112 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, science.abl9283 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR3\" id=\"ref-link-section-d131317958e1455\" target=\"_blank\" rel=\"noopener\">3<\/a>).<\/p>\n<p>Twenty InDS had probable endogenous origins (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig13\" target=\"_blank\" rel=\"noopener\">9<\/a>). Several have been described, including InD1 and InD2a, errors associated with nascent and template strand slippage during normal DNA replication, respectively<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94&#x2013;101 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR7\" id=\"ref-link-section-d131317958e1465\" target=\"_blank\" rel=\"noopener\">7<\/a>. InD1 and InD2a were seen universally across all tumor types except CNS and skin cancers, which showed a tissue-specific variant, InD2b (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig3\" target=\"_blank\" rel=\"noopener\">3a<\/a>). InD4a is attributed to TOP1 transcription-associated mutagenesis<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Reijns, M. A. M. et al. Signatures of TOP1 transcription-associated mutagenesis in cancer and germline. Nature 602, 623&#x2013;631 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR14\" id=\"ref-link-section-d131317958e1472\" target=\"_blank\" rel=\"noopener\">14<\/a>. InD6, marked by microhomology-mediated deletions, is associated with deficiency in HR repair<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94&#x2013;101 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR7\" id=\"ref-link-section-d131317958e1476\" target=\"_blank\" rel=\"noopener\">7<\/a>. InD8, which had deletions with little to no microhomology at deletion junctions, likely reflects the footprint of nonhomologous end-joining activity and\/or radiotherapy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Kocakavuk, E. et al. Radiotherapy is associated with a deletion signature that contributes to poor outcomes in patients with cancer. Nat. Genet. 53, 1088&#x2013;1096 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR41\" id=\"ref-link-section-d131317958e1481\" target=\"_blank\" rel=\"noopener\">41<\/a>.<\/p>\n<p>InD9a, correlated with SBS2 and SBS13 hypermutation, featured 1\u2009bp C deletions at TCT and TCA (mutated base underlined), identical to mutable motifs characteristic of SBS2\/SBS13, particularly at short poly-T tracts. It was presumptively induced by APOBEC (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig12\" target=\"_blank\" rel=\"noopener\">8c<\/a>), corroborated by experimental evidence from an APOBEC overexpression DT40 model<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"DeWeerd, R. A. et al. Prospectively defined patterns of APOBEC3A mutagenesis are prevalent in human cancers. Cell Rep 38, 110555 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR42\" id=\"ref-link-section-d131317958e1498\" target=\"_blank\" rel=\"noopener\">42<\/a>. We proposed a mutagenesis mechanism wherein following C-to-U deamination at TCT by APOBEC, uracil removal by UNG leaves an uninformative abasic site. Template strand slippage can then occur at this short repetitive T tract, leading to a C deletion (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig12\" target=\"_blank\" rel=\"noopener\">8d<\/a>). For reasons currently unclear, we also found similar C-deletion-dominated InD9b\/InD9c, which, although resembled InD9a, lacked the predilection for a preceding T, and was possibly caused by an alternative mechanism.<\/p>\n<p>Interestingly, we extracted eight gene-specific MMRd and Pol-dys InDS. MMRd-InD7 contrasts with COSMIC ID7. InD7 is characterized by the expected excess of 1\u2009bp and 2\u2009bp deletions, particularly at longer mononucleotide\/dinucleotide repeat tracts. InD7 clustered with experimental signatures of \u0394MLH1, \u0394MSH2 and \u0394MSH6 (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig13\" target=\"_blank\" rel=\"noopener\">9<\/a>). We also identified InD19 (due to PMS2 deficiency), InD14 (associated with POLD1 exonuclease mutations), InD15 (associated with POLE exonuclease mutations), InD16a and 16b (resulting from concurrent loss of POLE proofreading and MMR), InD21 (associated with combined POLD1 proofreading defect and MMRd) and InD20, which we found through experimental investigations, was due to MMRd occurring on a POLE dysfunction background.<\/p>\n<p>The remaining 12 signatures were of uncertain etiology. Five were probably artifacts\u2014InD27 and InD28 often co-occurred, incurring thousands of InDels, and were related to SBS57, potentially an amplification or a sequencing artifact<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94&#x2013;101 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR7\" id=\"ref-link-section-d131317958e1546\" target=\"_blank\" rel=\"noopener\">7<\/a>. InD28m was likely a mixed signature of InD28 and InD4, remaining to be resolved with larger cohorts. InD5 and InD10 were ubiquitous and possibly artifacts.<\/p>\n<p>While C insertions dominated both InD26 and InD30 at poly-C tracts followed by a 3\u2032A, InD30 C insertions induced thousands of insertions at homopolymers CCC and CCCC, whereas InD26 C insertions mainly occurred at longer CCCCC and were not associated with hypermutation.<\/p>\n<p>Three InDS (InD31, InD24 and InD12) showed striking correlations with signatures of other classes. InD31 displayed distinct C deletions at short homopolymers (<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, science.abl9283 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR3\" id=\"ref-link-section-d131317958e1556\" target=\"_blank\" rel=\"noopener\">3<\/a>), and often co-occurred with InD8. InD24 deletions peaked prominently at GTA and GTG and were strongly correlated with DBS8, which shows double substitutions at the same motifs (TGTG\u2009&gt;\u2009TAGG\/TTGG). InD12 exhibited C deletions between dinucleotides AA and AT and was associated with DBS25 featuring a tall peak at TT dinucleotide. Despite clear co-occurrence, the causes for these signatures remain cryptic.<\/p>\n<p>InD4b and InD29 shared features with InD4a and InD8, respectively. Whether they represented tissue-specific variants, were mixed or caused by different mechanisms requires further investigation. InD11 appeared related to InD1 and might be an oversplit signature frequently enriched in high InDel burden samples, such as those with MMRd and Pol-dys. Seen in bladder and colorectal cancers, InD23 showed a striking pattern of longer insertions (\u22655\u2009bp) at nonrepeats. These insertions were almost exclusively tandemly duplicated from immediate neighboring sequences. InD33 was most prominent in one CNS tumor treated with temozolomide; however, its etiology remains unknown.<\/p>\n<p>In summary, 5 InDS were of likely exogenous origins (InD3a, InD3b, InD13, InD18 and InD32), 20 were endogenous (InD1, InD2a, InD2b, InD4a, InD4b, InD6, InD7, InD8, InD9a, InD9b, InD9c, InD11, InD14, InD15, InD16a, InD16b, InD19, InD20, InD21 and InD29) and 12 had uncertain sources (InD5, InD10, InD12, InD23, InD24, InD26, InD27, InD28, InD28m, InD30, InD31 and InD33).<\/p>\n<p>A signature-based classifier of PRR dysfunction<\/p>\n<p>PRRd subtypes, typified by MSI, are clinically actionable with potential selective sensitivity to immunotherapies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Rizvi, N. A. et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124&#x2013;128 (2015).\" href=\"#ref-CR20\" id=\"ref-link-section-d131317958e1592\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Cristescu, R. et al. Pan-tumor genomic biomarkers for PD-1 checkpoint blockade-based immunotherapy. Science 362, eaar3593 (2018).\" href=\"#ref-CR21\" id=\"ref-link-section-d131317958e1592_1\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Wang, F. et al. Evaluation of POLE and POLD1 Mutations as Biomarkers for Immunotherapy Outcomes Across Multiple Cancer Types. JAMA Oncol. 5, 1504&#x2013;1506 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR22\" id=\"ref-link-section-d131317958e1595\" target=\"_blank\" rel=\"noopener\">22<\/a>. Current methods of detecting PRRd mainly rely on immunohistochemistry (IHC) staining of MMR proteins (but not for polymerase mutants) and\/or PCR-based assays to determine MSI at selected genomic loci. These assays are not sensitive or robust enough, especially in nonepithelial tissues<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Chung, J. et al. DNA Polymerase and Mismatch Repair Exert Distinct Microsatellite Instability Signatures in Normal and Malignant Human Cells. Cancer Discov. 11, 1176&#x2013;1191 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR16\" id=\"ref-link-section-d131317958e1599\" target=\"_blank\" rel=\"noopener\">16<\/a>. Using insights from this study, we therefore explored constructing a classifier for tumor PRRd stratification, reporting MMRd, Pol-dys and mixed MMRd\/Pol-dys as distinct classes versus PRR proficiency.<\/p>\n<p>We used 571 GEL cancers assigned as MMRd (n\u2009=\u2009214), Pol-dys (n\u2009=\u200936), mixed MMRd\/Pol-dys (n\u2009=\u200941) or PRR-proficient (controls, n\u2009=\u2009280) based on confirmed causal genotypes and allelic status, and\/or supporting IHC staining (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig4\" target=\"_blank\" rel=\"noopener\">4a<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">12<\/a>). Samples treated as controls had neither MMRd and\/or Pol-dys confirmed through the lack of driver mutations in key MMR genes (that is, MLH1, MSH2, MSH6, and PMS2), POLE, POLD1, and displayed no evidence of MSI associated with these abnormalities<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Huang, M. N. et al. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. Sci Rep. 5, 13321 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR43\" id=\"ref-link-section-d131317958e1644\" target=\"_blank\" rel=\"noopener\">43<\/a>. We trained multiple multinomial elastic net regression models applying 7:3 partitioning iteratively across the dataset. Through exploring all possible features\/models (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">13<\/a>), we identified exposures of SBS and InDS associated with MMRd, Pol-dys and mixed MMRd\/Pol-dys, as well as the ratio of total InDels to substitutions as the most predictive features (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig4\" target=\"_blank\" rel=\"noopener\">4b<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">14<\/a>; <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Sec9\" target=\"_blank\" rel=\"noopener\">Methods<\/a>). The final model, termed PRRDetect (postreplicative repair detect), was retrained on the entire dataset (n\u2009=\u2009571). Then, in an independent validation cohort of 504 ICGC breast cancers<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47&#x2013;54 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR44\" id=\"ref-link-section-d131317958e1664\" target=\"_blank\" rel=\"noopener\">44<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Davies, H. et al. Whole-genome sequencing reveals breast cancers with mismatch repair deficiency. Cancer Res. 77, 4755&#x2013;4762 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR45\" id=\"ref-link-section-d131317958e1667\" target=\"_blank\" rel=\"noopener\">45<\/a> and 847 GEL cancers, for which the true labels of PRRd were known, PRRDetect achieved an AUROC (area under the ROC curve) of 1 and an AUPRC (precision\u2013recall curve) of 0.99 at distinguishing PRR-dysfunctional from PRR-proficient samples, performing superiorly to other MSI\/MMRd detection tools, including MSIseq<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Huang, M. N. et al. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. Sci Rep. 5, 13321 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR43\" id=\"ref-link-section-d131317958e1671\" target=\"_blank\" rel=\"noopener\">43<\/a>, MMRDetect<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Zou, X. et al. A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage. Nat. Cancer 2, 643&#x2013;657 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR46\" id=\"ref-link-section-d131317958e1675\" target=\"_blank\" rel=\"noopener\">46<\/a> and TMB\u2014an approved biomarker for immunotherapies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 20\" title=\"Rizvi, N. A. et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124&#x2013;128 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR20\" id=\"ref-link-section-d131317958e1679\" target=\"_blank\" rel=\"noopener\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Cristescu, R. et al. Pan-tumor genomic biomarkers for PD-1 checkpoint blockade-based immunotherapy. Science 362, eaar3593 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR21\" id=\"ref-link-section-d131317958e1682\" target=\"_blank\" rel=\"noopener\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Subbiah, V., Solit, D. B., Chan, T. A. &amp; Kurzrock, R. The FDA approval of pembrolizumab for adult and pediatric patients with tumor mutational burden (TMB)&#x2009;&#x2265;&#x2009;10: a decision centered on empowering patients and their physicians. Ann. Oncol. 31, 1115&#x2013;1118 (2020).\" href=\"#ref-CR47\" id=\"ref-link-section-d131317958e1685\">47<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Samstein, R. M. et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat. Genet. 51, 202&#x2013;206 (2019).\" href=\"#ref-CR48\" id=\"ref-link-section-d131317958e1685_1\">48<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Marabelle, A. et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study. Lancet Oncol. 21, 1353&#x2013;1365 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR49\" id=\"ref-link-section-d131317958e1688\" target=\"_blank\" rel=\"noopener\">49<\/a> (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig4\" target=\"_blank\" rel=\"noopener\">4c,d<\/a> and Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">12<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">15<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">16<\/a>).<\/p>\n<p><b id=\"Fig4\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 4: PRRDetect improves the detection of tumors with PRR dysfunction.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02152-y\/figures\/4\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig4\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41588_2025_2152_Fig4_HTML.png\" alt=\"figure 4\" loading=\"lazy\" width=\"685\" height=\"562\"\/><\/a><\/p>\n<p><b>a<\/b>, Simplified workflow of the development of PRRDetect classifier. (1) Initial exploratory training using 571 ground truth samples. (2) Final retraining to produce the PRRDetect classifier. -ve ctrl, negative control. <b>b<\/b>, Distribution of coefficients across seven genomic features contributing to the final PRRDetect classifier. Green error bars depict the mean\u2009\u00b1\u2009s.d. from ten replicates of training in cross-validation. Red dots indicate the final coefficients chosen for each class prediction (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">14<\/a>). <b>c<\/b>, Validation and application of PRRDetect on independent cancer cohorts. <b>d<\/b>, ROC curves demonstrating the superior performances of PRRDetect on independent cancer cohort (n\u2009=\u20091,351) against alternative biomarker strategies. P values were calculated using two-sided nonparametric test based on the bootstrap distribution (10,000) of the difference in AUCs<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR53\" id=\"ref-link-section-d131317958e1738\" target=\"_blank\" rel=\"noopener\">53<\/a>. MMRDetect, P\u2009\u221216; MSIseq, P\u2009=\u20096.617\u2009\u00d7\u200910\u221215; TMB, P\u2009\u221216. <b>e<\/b>, PRRDetect results of n\u2009=\u20091,335 ICGC and Hartwig cancers, ordered from the lowest to the highest prediction probability across the x axis (left to right) for MMRd (purple), combined MMRd\/Pol-dys (blue) and Pol-dys samples (orange). Negative samples were ordered by TMB in increasing order from left to right. Results of MSIseq, MMRDetect, cancer gene driver annotation and cancer tissue origin are labeled at the bottom tracks. Dashed rectangle highlights the extent of false positive overcalling if using TMB\u2009&gt;\u200910 mutations per Mb as a cutoff. <b>f<\/b>, Concordance of calls among TMB-high (&gt;10 mutations per Mb), positive exposure to SBS signatures that impart hypermutation and PRRDetect prediction across n\u2009=\u20091,335 ICGC and Hartwig cancers. <b>g<\/b>, Concordance of calls among TMB-high (&gt;10 mutations per Mb), positive exposure to SBS signatures that impart hypermutation and PRRDetect prediction across n\u2009=\u20094,775 GEL tumors. muts, mutations.<\/p>\n<p>Next, to survey the prevalence of PRRd across alternative cancer cohorts, we applied PRRDetect on seven cancer types commonly enriched with hypermutator samples from ICGC<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"The ICGC\/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82&#x2013;93 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR33\" id=\"ref-link-section-d131317958e1791\" target=\"_blank\" rel=\"noopener\">33<\/a> and Hartwig<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210&#x2013;216 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR34\" id=\"ref-link-section-d131317958e1795\" target=\"_blank\" rel=\"noopener\">34<\/a> (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig4\" target=\"_blank\" rel=\"noopener\">4c,e,f<\/a>, Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig14\" target=\"_blank\" rel=\"noopener\">10a,b<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">17<\/a>). PRRDetect predicted 3.7% (50\/1,335) samples as PRR-dysfunctional, correctly identifying all Pol-dys, MMRd\/Pol-dys samples and missing two subclonal MMRd samples (based on available published driver information for PRRd status). MSIseq missed 6 of 43 PRRDetect-predicted MMRd, 2 of 4 mixed MMRd\/Pol-dys cases while displaying poor concordance for detecting pure Pol-dys cases (that is, missed all 7 cases). Unsurprisingly, PRRDetect captured all MMRDetect-positive cases. However, MMRDetect failed to identify all PRRd cases as it was not designed to detect Pol-dys\/mixed phenotypes and missed seven MMRd samples. Crucially, we noted that many PRRDetect-positive cases did not have an associated driver mutation identified (33\/50). This is clinically significant. Of 50 PRRDetect-positive cases, 39 were MMRd (only 8 had an associated driver mutation), 7 were Pol-dys (all had driver mutations in polymerase proofreading domains) and 4 were predicted as mixed MMRd\/Pol-dys (2 had POLE exonuclease mutations and none had MMR drivers). If PRRDetect predictions were all true and sequencing approaches focused exclusively on identifying driver events associated with these deficiencies were used, a significant proportion of cases (66%) could be missed.<\/p>\n<p>Given that PRRd cancers often present with high TMB, and TMB is used as a biomarker for immunotherapies, we explored the limits of TMB-based patient stratification. With an FDA-approved TMB cutoff of 10 mutations per Mb<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Marabelle, A. et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study. Lancet Oncol. 21, 1353&#x2013;1365 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR49\" id=\"ref-link-section-d131317958e1815\" target=\"_blank\" rel=\"noopener\">49<\/a>, just over a tenth of 459 cases classified as TMB-high (50\/459, 10.9%) had predicted PRR dysfunction (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig4\" target=\"_blank\" rel=\"noopener\">4f<\/a>, Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig14\" target=\"_blank\" rel=\"noopener\">10b<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">17<\/a>). The majority of other cases (353\/459, 76.9%) had high TMB from tobacco, UV and APOBEC exposure; 56 (12.2%) were due to alternative causes. Thus, across independent cancer cohorts where MMRd and Pol-dys are known to occur at higher frequencies, ~89% of the samples classified as TMB-high may not have the intrinsic biological underpinnings associated with response to immunotherapies, with implications for the use of TMB as a selective biomarker for ICI<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"McGrail, D. J. et al. High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types. Ann. Oncol. 32, 661&#x2013;672 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR50\" id=\"ref-link-section-d131317958e1828\" target=\"_blank\" rel=\"noopener\">50<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Addeo, A., Banna, G. L. &amp; Weiss, G. J. Tumor mutation burden-from hopes to doubts. JAMA Oncol. 5, 934&#x2013;935 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#ref-CR51\" id=\"ref-link-section-d131317958e1831\" target=\"_blank\" rel=\"noopener\">51<\/a>.<\/p>\n<p>We asked whether this trend extended to the larger GEL cohort (n\u2009=\u20094,775). Among the 1,371 TMB-high cases, nearly half (677, 49.4%) were predicted as having MMRd and\/or Pol-dys (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig4\" target=\"_blank\" rel=\"noopener\">4g<\/a>), of which, only ~50% of them had an identified driver. The remaining 564 (41.1%) had high TMB due to alternative mutagenic exposures; 130 (9.5%) were due to other undetermined causes. Furthermore, beyond revealing PRR dysfunction in typical tumor types such as colorectal cancers (19%, 400\/2,146) and uterine cancers (37%, 255\/695), PRRDetect predicted PRRd in a small but notable proportion of stomach (11\/181, 6%), bladder (3\/347, 1%), CNS (3\/392, 1%) and lung cancers (8\/958, 1%; Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#Fig14\" target=\"_blank\" rel=\"noopener\">10c<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02152-y#MOESM4\" target=\"_blank\" rel=\"noopener\">12<\/a>). This reinforces two important clinical points\u2014first, PRRd is not restricted to colorectal and uterine cancers despite being more prevalent in these tumor types; second, WGS can serve as a tumor-agnostic assay uncovering PRRd and any other actionable biological abnormalities in the future.<\/p>\n","protected":false},"excerpt":{"rendered":"Diversity of InDel patterns in PRRd We generated a \u2018ground truth\u2019 set of isogenic cellular models by introducing&hellip;\n","protected":false},"author":2,"featured_media":14199,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3846],"tags":[3971,3973,3967,3970,3972,3968,267,3969,9826,70,9827,16,15],"class_list":{"0":"post-14198","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-agriculture","9":"tag-animal-genetics-and-genomics","10":"tag-biomedicine","11":"tag-cancer-research","12":"tag-gene-function","13":"tag-general","14":"tag-genetics","15":"tag-human-genetics","16":"tag-personalized-medicine","17":"tag-science","18":"tag-tumour-biomarkers","19":"tag-uk","20":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114325948036929883","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/14198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=14198"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/14198\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/14199"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=14198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=14198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=14198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}