{"id":413882,"date":"2025-09-10T19:14:12","date_gmt":"2025-09-10T19:14:12","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/413882\/"},"modified":"2025-09-10T19:14:12","modified_gmt":"2025-09-10T19:14:12","slug":"fluctuating-dna-methylation-tracks-cancer-evolution-at-clinical-scale","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/413882\/","title":{"rendered":"Fluctuating DNA methylation tracks cancer evolution at clinical scale"},"content":{"rendered":"<p>Assembly and quality control of DNA methylation data<\/p>\n<p>We assembled and processed with a harmonized pipeline<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066&#x2013;1081 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR14\" id=\"ref-link-section-d420590567e2464\" target=\"_blank\" rel=\"noopener\">14<\/a> (v4.1; see Code availability section) 2,430 bulk sample Illumina methylation array data of normal and neoplastic lymphoid cells from previous publications<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066&#x2013;1081 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR14\" id=\"ref-link-section-d420590567e2468\" target=\"_blank\" rel=\"noopener\">14<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Kulis, M. et al. Whole-genome fingerprint of the DNA methylome during human B cell differentiation. Nat. Genet. 47, 746&#x2013;756 (2015).\" href=\"#ref-CR21\" id=\"ref-link-section-d420590567e2471\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Nordlund, J. et al. Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia. Genome Biol. 14, r105 (2013).\" href=\"#ref-CR22\" id=\"ref-link-section-d420590567e2471_1\">22<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Reinius, L. E. et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE 7, e41361 (2012).\" href=\"#ref-CR23\" id=\"ref-link-section-d420590567e2471_2\">23<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Lee, S. T. et al. Epigenetic remodeling in B-cell acute lymphoblastic leukemia occurs in two tracks and employs embryonic stem cell-like signatures. Nucleic Acids Res. 43, 2590&#x2013;2602 (2015).\" href=\"#ref-CR24\" id=\"ref-link-section-d420590567e2471_3\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Queir&#xF3;s, A. C. et al. Decoding the DNA methylome of mantle cell lymphoma in the light of the entire B cell lineage. Cancer Cell 30, 806&#x2013;821 (2016).\" href=\"#ref-CR25\" id=\"ref-link-section-d420590567e2471_4\">25<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Nadeu, F. et al. Genomic and epigenomic insights into the origin, pathogenesis, and clinical behavior of mantle cell lymphoma subtypes. Blood 136, 1419&#x2013;1432 (2020).\" href=\"#ref-CR26\" id=\"ref-link-section-d420590567e2471_5\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Nadeu, F. et al. Detection of early seeding of Richter transformation in chronic lymphocytic leukemia. Nat. Med. 28, 1662&#x2013;1671 (2022).\" href=\"#ref-CR27\" id=\"ref-link-section-d420590567e2471_6\">27<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Oakes, C. C. et al. DNA methylation dynamics during B cell maturation underlie a continuum of disease phenotypes in chronic lymphocytic leukemia. Nat. Genet. 48, 253&#x2013;264 (2016).\" href=\"#ref-CR28\" id=\"ref-link-section-d420590567e2471_7\">28<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Dietrich, S. et al. Drug-perturbation-based stratification of blood cancer. J. Clin. Invest. 128, 427&#x2013;445 (2018).\" href=\"#ref-CR29\" id=\"ref-link-section-d420590567e2471_8\">29<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Agirre, X. et al. Whole-epigenome analysis in multiple myeloma reveals DNA hypermethylation of B cell-specific enhancers. Genome Res. 25, 478&#x2013;487 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR30\" id=\"ref-link-section-d420590567e2474\" target=\"_blank\" rel=\"noopener\">30<\/a>. As healthy control samples, this dataset contained sorted CD19+ B cells (n\u2009=\u200940), CD3+ T cells (n\u2009=\u200935), peripheral blood mononuclear cells (n\u2009=\u20096) and whole-blood samples (n\u2009=\u20096). As tumour samples, we included precursor 797 B-ALLs and 90 T-ALLs at diagnosis, 28 B-ALLs and 2 T-ALLs at relapse, as well as 74 B-ALLs and 12 T-ALLs at complete remission (that is, normal blood); 149 MCLs; 722 CLLs, 55 of its precursor condition MBL and 6 samples from patients with CLL undergoing a DLBCL transformation called Richter transformation; 62 primary DLBCL, not otherwise specified; and 104 multiple myeloma and 16 of its precursor condition monoclonal gammopathy of undetermined significance. In brief, raw idat files were loaded and processed with R (v4.3.1) using the minfi package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363&#x2013;1369 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR50\" id=\"ref-link-section-d420590567e2495\" target=\"_blank\" rel=\"noopener\">50<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Fortin, J.-P., Triche, T. J. Jr &amp; Hansen, K. D. Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi. Bioinformatics 33, 558&#x2013;560 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR51\" id=\"ref-link-section-d420590567e2498\" target=\"_blank\" rel=\"noopener\">51<\/a> (v1.46.0) in batches as specified in the column \u2018SSNOB_NORMALIZATION_BATCH\u2019 of Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM3\" target=\"_blank\" rel=\"noopener\">2<\/a>. In brief, the data were processed for each batch as follows. First, idats files were loaded into a RGChannelSet object, and minfi quality metrics using the qcReport function were performed, removing samples with unexpected distributions of methylation values (that is, distributions markedly distinct from a bimodal centred around 0 and 1 \u03b2-values and\/or from the remaining samples) and low signal intensities of internal control probes for each sample, including bisulfite conversions I and II, extension hybridization, hybridization, non-polymorphic, specificities I and II, and target removal probes.<\/p>\n<p>Next, further quality metrics were derived using the function minfiQC on the unnormalized RGChannelSet obejct. Those samples with median signal intensities of unmethylated and methylated channels of at least 10.5 in log2 scale were considered as having good signal intensities. Subsequently, detection P values were calculated across all CpGs and samples using the detectionP function for the unnormalized RGChannelSet object. Samples were considered as good if having a mean detection P value across all CpGs of P\u2009\u2264\u20090.01. On a CpG level, we retained CpGs with a detection P\u2009\u2264\u20091\u2009\u00d7\u200910\u221216 in 90% or more of the samples, which has been shown to improve the quality of downstream analyses<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Lehne, B. et al. A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol. 16, 37 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR52\" id=\"ref-link-section-d420590567e2525\" target=\"_blank\" rel=\"noopener\">52<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Zhou, W., Triche, T. J., Laird, P. W. &amp; Shen, H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 46, e123 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR53\" id=\"ref-link-section-d420590567e2528\" target=\"_blank\" rel=\"noopener\">53<\/a>. The RGChannelSet object was normalized with the single-sample batch-independent preprocessNoob function with dye bias correction. We next retained only CpGs (excluding CH probes) that did not contain any SNP neither in the interrogated CpGs nor in the probe extension using the dropMethylationLoci and dropLociWithSnps functions with default options (minor allele frequency (MAF)\u2009=\u20090). Further analyses using long-read nanopore data, Illumina array control probes, annotation packages and a data-driven approach were used to ensure the lack of any genetic confounding in the methylation values of the resulting fCpGs (see the next sections).<\/p>\n<p>Furthermore, CpGs with any previous evidence of potential cross-hybridization were excluded<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Chen, Y. A. et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8, 203&#x2013;209 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR54\" id=\"ref-link-section-d420590567e2535\" target=\"_blank\" rel=\"noopener\">54<\/a> and only CpGs mapping to autosomal chromosomes were subsequently retained for downstream analyses. Finally, to further confirm the accuracy of the filtering criteria, we checked the distribution of normalized methylation values and performed principal component analyses separately for samples passing all quality checks as well as those considered as bad samples. The final DNA methylation matrix contained 2,204 samples and 389,180 CpGs passing all the aforementioned quality controls, and included 2,054 patients (22 technical replicates, 3 synchronic and 125 longitudinal samples from the same patients)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Duran-Ferrer, M., Gabbutt, C., Martin-Subero, J. I. &amp; Graham, T. Harmonised methylation array matrix related to the article Fluctuating DNA methylation tracks cancer evolution at clinical scale. Zenodo &#010;                https:\/\/doi.org\/10.5281\/ZENODO.15479737&#010;                &#010;               (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR55\" id=\"ref-link-section-d420590567e2539\" target=\"_blank\" rel=\"noopener\">55<\/a> (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM3\" target=\"_blank\" rel=\"noopener\">2<\/a>).<\/p>\n<p>To determine the purity of samples, we used our previously deconvolution strategy to infer tumour cell content by DNA methylation<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066&#x2013;1081 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR14\" id=\"ref-link-section-d420590567e2549\" target=\"_blank\" rel=\"noopener\">14<\/a>, which was used as a consensus purity in all the tumour samples except for DLBCL and multiple myeloma. In these two tumour entities, we have previously identified a DNA methylation signature loss causing inaccurate tumour purity predictions using DNA methylation data, and therefore we used available genetic or flow cytometry data for DLBCL and multiple myeloma, respectively.<\/p>\n<p>Pipeline to select fluctuating CpGs<\/p>\n<p>We constructed a pipeline to identify fCpGs in lymphoid tumours, based on the following criteria:<\/p>\n<ol class=\"u-list-style-none\">\n<li>\n                  (1)<\/p>\n<p>Heterogeneous across different participants with the same disease (by accepting CpG loci with the top 5% of standard deviation of methylation value within a cancer type).<\/p>\n<\/li>\n<li>\n                  (2)<\/p>\n<p>Equally likely to be methylated or unmethylated (by selecting CpGs with average methylation of approximately 0.5 within a cancer type).<\/p>\n<\/li>\n<li>\n                  (3)<\/p>\n<p>Unlikely to be associated with specific cell or cancer types. We used an unsupervised Laplacian score feature selection metric<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"He, X., Cai, D. &amp; Niyogi, P. Laplacian score for feature selection. In Advances in Neural Information Processing Systems 18 (eds. Weiss, Y., Sch&#xF6;lkopf, B. &amp; Platt, J.) (MIT Press, 2005).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR56\" id=\"ref-link-section-d420590567e2593\" target=\"_blank\" rel=\"noopener\">56<\/a> to rank CpG loci by their tendency to preserve the nearest-neighbour graph, and accepted the 5% least-informative CpGs.<\/p>\n<\/li>\n<\/ol>\n<p>Exclusion of genetic confounding on fCpGs<\/p>\n<p>We performed a series of analyses to exclude the potential genetic confounding (germline SNPs and somatic SNVs) on our fCpGs. We first excluded the possibility that common germline SNPs caused methylation heterogeneity at fCpG sites between individuals. We observed very distinct methylation dynamics of array control probes containing SNPs (which had been removed during the initial array processing) versus fCpGs. SNP probes showed the same distribution in all samples (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig7\" target=\"_blank\" rel=\"noopener\">2c<\/a>), including longitudinally followed cases (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">3<\/a>), whereas fCpGs only showed a W distribution in cancer samples with ongoing fluctuations over time. Thus, although SNPs reflect the stable genetic identity of the individual, fCpGs reflect the identity of a single cell and its evolving lineage. In addition, we used the packages SNPlocs.Hsapiens.dbSNP155.GRCh38 (v0.99.24) and MafH5.gnomAD.v4.0.GRCh38 (v3.19) to check for any known significant germline or somatic genetic confounding on the resulting 978 fCpGs. We found approximately 60% of fCpGs reported in the gnomAD v4 database (with the array background having approximately 65%), but with a very low MAF (median of 1\u2009\u00d7\u200910\u22125 and mean of 1\u2009\u00d7\u200910\u22123). To exclude the possibility of unknown or very rare genetic confounding, we used the data-driven gaphunting algorithm<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Andrews, S. V., Ladd-Acosta, C., Feinberg, A. P., Hansen, K. D. &amp; Fallin, M. D. &#x201C;Gap hunting&#x201D; to characterize clustered probe signals in Illumina methylation array data. Epigenetics Chromatin 9, 56 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR57\" id=\"ref-link-section-d420590567e2618\" target=\"_blank\" rel=\"noopener\">57<\/a> available in the minfi R package, which further discarded a possible cancer-specific single-nucleotide variation (SNV) that could confound the methylation values at the 978 identified fCpGs. Finally, Oxford Nanopore long read of a subset of normal and neoplastic samples further validated that fCpGs represent de\/methylated cytosines (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig7\" target=\"_blank\" rel=\"noopener\">2d,e<\/a>; see next section).<\/p>\n<p>Generation and analyses of long-read nanopore data<\/p>\n<p>For long-read methylation sequencing in CLL and Richter transformation samples, concentration was assessed using the Qubit assay and DNA integrity was analysed either with the Femto Pulse System (Agilent) or the Fragment Analyzer (Agilent). When more than 6\u2009\u00b5g of material with good integrity was available, DNA was additionally treated with the Short Fragment Eliminator Kit XS (PacBio) and eluted in EB buffer. Approximately 4\u2009\u00b5g of DNA was used for library preparation according to the standard LSK114 kit and protocol from Oxford Nanopore. The time for DNA repair and end-prep was increased up to 30\u2009min at 20\u2009\u00b0C and 30\u2009min at 65\u2009\u00b0C. Adapter ligation was performed for 1\u2009h at room temperature. All elutions were performed at 37\u2009\u00b0C for 1.5\u2009h, and 550\u2013600\u2009ng of DNA was loaded onto a FLO-PRO114M (CLL cells) flow cells. Flow cells were washed (EXP-WSH004) after 1\u20132 days, if pore count decreased to less than 30%. A total of 1\u20134 washes were performed for each flow cell. Flow cells were run for 100 (CLL cells) hours in total with the Fast model (MinKNOW 23.11.7, Dorado 7.2.13). The raw data were rebasecalled using dorado duplex (v0.5.3) and applying the SUP and modified call to detect 5mC and 5hmC, (model dna_r10.4.1_e8.2_400bps_sup@v4.3.0_5mCG_5hmCG@v1).<\/p>\n<p>In normal B cell samples, 1\u20133\u2009\u00b5g of DNA was used for WGS. Libraries were prepared with the DNA ligation kit LSK110 with no modifications. Libraries were loaded onto a flow cell version FLO-PRO002 (R9.4) and were run for 90\u2013110\u2009h. The basecalling was performed on live mode with the Guppy basecaller (v6.2.7), included in the MinKNOW (v22.08.6), using the SUP model for base modification detection of 5mC and 5hmC (dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_sup.cfg).<\/p>\n<p>In all samples, the generated unmapped BAM files after the basecalling were converted to FASTQ files using the SAMtools fastq -T Mm, Ml command. The FASTQ files were then mapped to BAM files using the command minimap2 -ax map-ont -y..\/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.mmi. The methylation values were extracted from the BAMs into bedMethyl files using the in-house tool bam2bedmethyl (v0.3.2) and compressed\/indexed using bgzip\/tabix. Reads from each strand were combined to generate DNA matrices for each CpG and were used for obtaining the methylation values of all fCpGs.<\/p>\n<p>In addition, mini BAM files containing all reads from the 976 fCpGs were generated (in hg38 genome assembly). The reads showed excellent mappability, with a mean of perfect nucleotide matches (NM tag; Levenshtein distance) for all fCpGs across samples of 96.41% (range of 73.31\u201397.90), and mean mapping quality (MAPQ) of all the reads covering all fCpGs across samples of 59.510 (range of 2\u201360). Subsequently, long reads were phased using variants called using Clair 3 (v1.0.9, model r941_prom_hac_g360\u2009+\u2009g422)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797&#x2013;803 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR58\" id=\"ref-link-section-d420590567e2643\" target=\"_blank\" rel=\"noopener\">58<\/a> with the Longphase package (v1.7)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Lin, J. H., Chen, L. C., Yu, S. C. &amp; Huang, Y. T. LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. Bioinformatics 38, 1816&#x2013;1822 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR59\" id=\"ref-link-section-d420590567e2647\" target=\"_blank\" rel=\"noopener\">59<\/a>. The methylation status of each CpG was called using the modcall function within the Longphase package. At fCpGs, only 2.7% of the reads were non-canonical bases (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig7\" target=\"_blank\" rel=\"noopener\">2d<\/a>). The variant allele frequency (VAF) of these mutations tended to be low and was negatively correlated with the coverage at that site (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">4a<\/a>). Hence, the majority of these non-canonical base pairs are probably due to errors in nucleotide assignment. There is also no association between the methylation status of different reads and the variants present within a 50-bp window of each fCpG locus (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">4b<\/a>). Hence, assessment of fCpG methylation via bead array was not majorly confounded by miscalled variants. The fCpG methylation patterns seen in the bead array data were replicated in the long-read data (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig7\" target=\"_blank\" rel=\"noopener\">2e<\/a>) and the correlation between the fraction methylated measured via bead array and long-read sequencing at fCpGs was excellent (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig7\" target=\"_blank\" rel=\"noopener\">2e<\/a>). The same correspondence was observed in WGBS data (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig7\" target=\"_blank\" rel=\"noopener\">2f<\/a>).<\/p>\n<p>To assess the intra-sample long-read diversity for each sample, the pairwise Hamming distances were calculated between every read on both haplotypes. The two lists of Hamming distances were concatenated, and the mean calculated as a summary statistic of the read diversity for each sample. One normal B cell sample contained only two reads from one haplotype, and zero from the other, and so was excluded from further analysis.<\/p>\n<p>Analysis of scRRBS data<\/p>\n<p>Previously published single-cell reduced representation bisulfite sequencing\u00a0(scRRBS) data were obtained<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Gaiti, F. et al. Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia. Nature 569, 576&#x2013;580 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR6\" id=\"ref-link-section-d420590567e2683\" target=\"_blank\" rel=\"noopener\">6<\/a> and the fCpG methylation values extracted methylation values for normal B cells from 6 donors and CLL cells from 12 patients. There was a high dropout rate, so to extract meaningful patterns we plotted a subset of 40 cells and 20 fCpGs with a high density and overlap of fCpGs across single cells as examples (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">5a,b<\/a>).<\/p>\n<p>To compare the full set of data accounting for the high degree of missing data, we used a metric of heterogeneity at a given fCpG that weights by the number of non-missing fCpGs according to:<\/p>\n<p>$${d}_{i}=\\sqrt{\\frac{{n}_{i}({n}_{i}-1)}{2}}\\sigma ({\\beta }_{i})$$<\/p>\n<p>Where ni is the number of non-NaN values for the ith fCpG, \\(\\frac{n(n-1)}{2}\\) is the total possible pairwise comparisons between a set of n objects and \u03c3(\u03b2i) is the standard deviation across the\u00a0methylation values of the ith fCpG (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">5c<\/a>).<\/p>\n<p>Characterization and annotation of fCpGs<\/p>\n<p>To characterize the genomic and regulatory context of fCpGs, we used a series of statistical analyses and database annotations. We annotated fCpGs using Illumina manifest and other genomic annotation packages available at Bioconductor including IlluminaHumanMethylation450kanno.ilmn12.hg19 (v0.6.1) and IlluminaHumanMethylationEPICanno.ilm10b2.hg19 (v0.6.0). We additionally used the packages SNPlocs.Hsapiens.dbSNP155.GRCh38 (v0.99.24) and MafH5.gnomAD.v4.0.GRCh38 (v3.19) to check any possible germline or somatic genetic confounding on the resulting 978 fCpGs. We found approximately 60% of fCpGs reported in the gnomAD v4 database (with the array background having approximately 65%), but with a very low MAF (median of 1\u2009\u00d7\u200910\u22125 and mean of 1\u2009\u00d7\u200910\u22123). In addition, we used the Illumina 450k and EPIC array internal SNP probes and showed a dramatically distinct methylation dynamics compared with fCpGs in single-timepoint (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig7\" target=\"_blank\" rel=\"noopener\">2c<\/a>) and longitudinal (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">3<\/a>) samples. Finally, the data-driven gaphunting algorithm available in the minfi R package was applied with all the previously published thresholds and cut-offs<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Andrews, S. V., Ladd-Acosta, C., Feinberg, A. P., Hansen, K. D. &amp; Fallin, M. D. &#x201C;Gap hunting&#x201D; to characterize clustered probe signals in Illumina methylation array data. Epigenetics Chromatin 9, 56 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR57\" id=\"ref-link-section-d420590567e2907\" target=\"_blank\" rel=\"noopener\">57<\/a>, which further discarded possible cancer-specific SNV that could confound the methylation values at the 978 identified fCpGs.<\/p>\n<p>We used Chi-squared tests to assess the enrichment of fCpGs in distinct genomic regions or elements. We performed gene-set enrichment analysis on the fCpG-associated genes using gProfiler<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191&#x2013;W198 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR60\" id=\"ref-link-section-d420590567e2914\" target=\"_blank\" rel=\"noopener\">60<\/a>, specifically focusing on the Gene Ontology biological processes<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 61\" title=\"Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25&#x2013;29 (2000).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR61\" id=\"ref-link-section-d420590567e2918\" target=\"_blank\" rel=\"noopener\">61<\/a> and the Human Protein Atlas<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 62\" title=\"Uhl&#xE9;n, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR62\" id=\"ref-link-section-d420590567e2922\" target=\"_blank\" rel=\"noopener\">62<\/a>. The statistical domain space was limited to genes targeted by at least one CpG in the 389,180 candidate CpG set and significance was determined using the g:SCS algorithm<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 63\" title=\"Reimand, J., Kull, M., Peterson, H., Hansen, J. &amp; Vilo, J. g:Profiler &#x2014; a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 35, W193&#x2013;W200 (2007).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR63\" id=\"ref-link-section-d420590567e2926\" target=\"_blank\" rel=\"noopener\">63<\/a>. Previous chromatin segmentation of normal and neoplastic B cells was used to assess the chromatin-state enrichment of fCpG<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066&#x2013;1081 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR14\" id=\"ref-link-section-d420590567e2930\" target=\"_blank\" rel=\"noopener\">14<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 64\" title=\"Beekman, R. et al. The reference epigenome and regulatory chromatin landscape of chronic lymphocytic leukemia. Nat. Med. 24, 868&#x2013;880 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR64\" id=\"ref-link-section-d420590567e2933\" target=\"_blank\" rel=\"noopener\">64<\/a>.<\/p>\n<p>fCpGs were checked for their overlap with previous \u2018epigenetic clocks\u2019, including mitotic<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066&#x2013;1081 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR14\" id=\"ref-link-section-d420590567e2940\" target=\"_blank\" rel=\"noopener\">14<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Yang, Z. et al. Correlation of an epigenetic mitotic clock with cancer risk. Genome Biol. 17, 205 (2016).\" href=\"#ref-CR65\" id=\"ref-link-section-d420590567e2943\">65<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Teschendorff, A. E. A comparison of epigenetic mitotic-like clocks for cancer risk prediction. Genome Med. 12, 56 (2020).\" href=\"#ref-CR66\" id=\"ref-link-section-d420590567e2943_1\">66<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Youn, A. &amp; Wang, S. The MiAge Calculator: a DNA methylation-based mitotic age calculator of human tissue types. Epigenetics 13, 192&#x2013;206 (2018).\" href=\"#ref-CR67\" id=\"ref-link-section-d420590567e2943_2\">67<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 68\" title=\"Zhou, W. et al. DNA methylation loss in late-replicating domains is linked to mitotic cell division. Nat. Genet. 50, 591&#x2013;602 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR68\" id=\"ref-link-section-d420590567e2946\" target=\"_blank\" rel=\"noopener\">68<\/a>, chronological age<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Bocklandt, S. et al. Epigenetic predictor of age. PLoS ONE 6, e14821 (2011).\" href=\"#ref-CR69\" id=\"ref-link-section-d420590567e2950\">69<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Garagnani, P. et al. Methylation of ELOVL2 gene as a new epigenetic marker of age. Aging Cell 11, 1132&#x2013;1134 (2012).\" href=\"#ref-CR70\" id=\"ref-link-section-d420590567e2950_1\">70<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359&#x2013;367 (2013).\" href=\"#ref-CR71\" id=\"ref-link-section-d420590567e2950_2\">71<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).\" href=\"#ref-CR72\" id=\"ref-link-section-d420590567e2950_3\">72<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Lin, Q. et al. DNA methylation levels at individual age-associated CpG sites can be indicative for life expectancy. Aging 8, 394&#x2013;401 (2016).\" href=\"#ref-CR73\" id=\"ref-link-section-d420590567e2950_4\">73<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Vidal-Bralo, L., Lopez-Golan, Y. &amp; Gonzalez, A. Simplified assay for epigenetic age estimation in whole blood of adults. Front. Genet. 7, 209192 (2016).\" href=\"#ref-CR74\" id=\"ref-link-section-d420590567e2950_5\">74<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Weidner, C. I. et al. Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol. 15, R24 (2014).\" href=\"#ref-CR75\" id=\"ref-link-section-d420590567e2950_6\">75<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Zhang, Q. et al. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing. Genome Med. 11, 54 (2019).\" href=\"#ref-CR76\" id=\"ref-link-section-d420590567e2950_7\">76<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Horvath, S. et al. Epigenetic clock for skin and blood cells applied to Hutchinson Gilford progeria syndrome and ex vivo studies. Aging 10, 1758&#x2013;1775 (2018).\" href=\"#ref-CR77\" id=\"ref-link-section-d420590567e2950_8\">77<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 78\" title=\"Shireby, G. L. et al. Recalibrating the epigenetic clock: implications for assessing biological age in the human cortex. Brain 143, 3763&#x2013;3775 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR78\" id=\"ref-link-section-d420590567e2953\" target=\"_blank\" rel=\"noopener\">78<\/a>, gestational age<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Bohlin, J. et al. Prediction of gestational age based on genome-wide differentially methylated regions. Genome Biol. 17, 207 (2016).\" href=\"#ref-CR79\" id=\"ref-link-section-d420590567e2957\">79<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Knight, A. K. et al. An epigenetic clock for gestational age at birth based on blood methylation data. Genome Biol. 17, 206 (2016).\" href=\"#ref-CR80\" id=\"ref-link-section-d420590567e2957_1\">80<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Lee, Y. et al. Placental epigenetic clocks: estimating gestational age using placental DNA methylation levels. Aging 11, 4238&#x2013;4253 (2019).\" href=\"#ref-CR81\" id=\"ref-link-section-d420590567e2957_2\">81<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Mayne, B. T. et al. Accelerated placental aging in early onset preeclampsia pregnancies identified by DNA methylation. Epigenomics 9, 279&#x2013;289 (2017).\" href=\"#ref-CR82\" id=\"ref-link-section-d420590567e2957_3\">82<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 83\" title=\"McEwen, L. M. et al. The PedBE clock accurately estimates DNA methylation age in pediatric buccal cells. Proc. Natl Acad. Sci. USA 117, 23329&#x2013;23335 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR83\" id=\"ref-link-section-d420590567e2960\" target=\"_blank\" rel=\"noopener\">83<\/a>, biological age and mortality<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Belsky, D. W. et al. DunedinPACE, a DNA methylation biomarker of the pace of aging. eLife 11, e73420 (2022).\" href=\"#ref-CR84\" id=\"ref-link-section-d420590567e2964\">84<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging 10, 573&#x2013;591 (2018).\" href=\"#ref-CR85\" id=\"ref-link-section-d420590567e2964_1\">85<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 86\" title=\"Lu, A. T. et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging 11, 303&#x2013;327 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR86\" id=\"ref-link-section-d420590567e2967\" target=\"_blank\" rel=\"noopener\">86<\/a> and trait predictors<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 87\" title=\"McCartney, D. L. et al. Epigenetic prediction of complex traits and death. Genome Biol. 19, 136 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR87\" id=\"ref-link-section-d420590567e2971\" target=\"_blank\" rel=\"noopener\">87<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 88\" title=\"Liang, X. et al. DNA methylation signature on phosphatidylethanol, not on self-reported alcohol consumption, predicts hazardous alcohol consumption in two distinct populations. Mol. Psychiatry 26, 2238&#x2013;2253 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR88\" id=\"ref-link-section-d420590567e2974\" target=\"_blank\" rel=\"noopener\">88<\/a>. The package methylCIPHER (<a href=\"https:\/\/github.com\/MorganLevineLab\/methylCIPHER\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/MorganLevineLab\/methylCIPHER<\/a>) was used to obtain the CpGs for most of the epigenetic clocks. The package methylclock (v1.10.0) was used to calculate all epigenetic clocks but epiCMIT, which was derived as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066&#x2013;1081 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR14\" id=\"ref-link-section-d420590567e2986\" target=\"_blank\" rel=\"noopener\">14<\/a>.<\/p>\n<p>CLL RNA sequencing data<\/p>\n<p>Previously available RNA sequencing data for 294 patients with CLL were obtained<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519&#x2013;524 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR33\" id=\"ref-link-section-d420590567e2998\" target=\"_blank\" rel=\"noopener\">33<\/a> and processed as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Nadeu, F. et al. Genomic and epigenomic insights into the origin, pathogenesis, and clinical behavior of mantle cell lymphoma subtypes. Blood 136, 1419&#x2013;1432 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR26\" id=\"ref-link-section-d420590567e3002\" target=\"_blank\" rel=\"noopener\">26<\/a>. Matched RNA sequencing data and DNA methylation data for the same patients at the same timepoint were available for 224 patients with CLL. Transcript per million counts were used to represent differential gene expression values across genes and samples. We used the gene annotation provided in the R Bioconductor package IlluminaHumanMethylationEPICanno.ilm10b2.hg19 to classify genes associated with fCpGs. Genes targeted by any fCpG were considered as \u2018fCpG genes\u2019.<\/p>\n<p>In each methylation sample, the 978 fCpGs were discretized as homozygous demethylated, heterozygous methylated or homozygous methylated (coded as [0,1,2], respectively). This was done by separately fitting a \u03b2-mixture model with three components to each sample using Stan<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 89\" title=\"Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR89\" id=\"ref-link-section-d420590567e3009\" target=\"_blank\" rel=\"noopener\">89<\/a> and extracting the component mixture probability. The gene expression value for genes classified as having and fCpG with 0, 1 or 2 alleles methylated were plotted as previously described.<\/p>\n<p>DNA methylation data from\u00a0normal\u00a0blood samples<\/p>\n<p>External DNA methylation data were download from the Gene Expression Omnibus database using the GEOquery R package (v2.72.0). For sorted immune cells, these include <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE137594\" target=\"_blank\" rel=\"noopener\">GSE137594<\/a> and <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE184269\" target=\"_blank\" rel=\"noopener\">GSE184269<\/a>. For whole-blood samples, these include <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE72773\" target=\"_blank\" rel=\"noopener\">GSE72773<\/a>, <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE55763\" target=\"_blank\" rel=\"noopener\">GSE55763<\/a>, <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE40279\" target=\"_blank\" rel=\"noopener\">GSE40279<\/a> and <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/query\/acc.cgi?acc=GSE36054\" target=\"_blank\" rel=\"noopener\">GSE36054<\/a>. Data were analysed with the normalization procedure used in each study together with the metadata provided. Mean and standard deviation for fCpGs were calculated with fCpGs present in the provided normalized matrices.<\/p>\n<p>A stochastic model of fCpGs in a\u00a0growing population<\/p>\n<p>We built a generative computational model of how the patterns of fCpGs vary over time (t) according to the evolutionary history of a cancer. Initially, our model focused on neutral evolution, before expanding to non-neutral modes of tumour evolution below. For the full explanation of the model, see the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">Supplementary Information<\/a>.<\/p>\n<p>Our model was parameterized in terms of the age of the patient at which the MRCA emerged (\u03c4), the exponential growth rate of the cancer (\u03b8) and the epigenetic switching rates of the fCpGs (\u03bc, \u03bd, \u03b3 and \u03b6). The model was partitioned into two phases: before and after the emergence of the MRCA. At time t\u2009=\u20090, the fCpGs were assumed to be equally likely to be homozygously methylated or demethylated. The fCpG status of the MRCA at time t\u2009=\u2009\u03c4 was calculated by applying matrix exponentiation.<\/p>\n<p>The second phase of the model consisted of a discrete time Markov process. The effective population size of the growing cancer was modelled as growing according to a deterministic exponential growth equation, Ne\u2009=\u2009e\u03b8(T\u2009\u2212\u2009\u03c4). Each fCpG was considered independently; at each time step, t\u2009\u2192\u2009t\u2009+\u2009\u03b4t, the number of homozygous-methylated (m), heterozygous-methylated (k) and homozygous-demethylated cells (w) at a specific fCpG was updated according to the epigenetic switching rates.<\/p>\n<p>At the time of sample, T, the fraction methylation of each simulated fCpG was calculated by summing the number of methylated alleles and normalizing by the total number of alleles in the population:<\/p>\n<p>$${\\beta }_{c}=\\frac{k+2m}{2{N}_{e}}$$<\/p>\n<p>We further accounted for contaminating normal cells and the technical noise introduced by the methylation bead array. The methylation of the contaminated samples was assumed to be an average of the cancer methylation, \u03b2c(t), weighted by the tumour purity \u03c1, and the average of the normal population, \u03b2n, weighted by 1\u2009\u2212\u2009\u03c1. Following our previous work, the bead array was assumed to saturate at extreme methylation values, shifting the minimum and maximum methylation by \u03b4 and \u03b5, respectively<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Gabbutt, C. et al. Fluctuating methylation clocks for cell lineage tracing at high temporal resolution in human tissues. Nat. Biotechnol. 40, 720&#x2013;730 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR4\" id=\"ref-link-section-d420590567e3255\" target=\"_blank\" rel=\"noopener\">4<\/a>. The noise of the bead array was assumed to be \u03b2-distributed, with precision parameter \u03ba.<\/p>\n<p>Non-neutral models of tumour evolution<\/p>\n<p>Alongside our model of neutral exponentially growing cancer populations, we devised two alternative models of cancer growth:<\/p>\n<ol class=\"u-list-style-none\">\n<li>\n                  (1)<\/p>\n<p>A subclonal selection model in which a single cell within the cancer develops a selective advantage and begins to grow at an increased growth rate.<\/p>\n<\/li>\n<li>\n                  (2)<\/p>\n<p>An independent clonal origins model, in which a patient has developed two distinct cancers concurrently.<\/p>\n<\/li>\n<\/ol>\n<p>For the subclonal selection model, we replaced the growth rate (\u03b8) and the time of the MRCA (\u03c4) with the growth rates and time of the MRCA of the initial, slower-growing population (\u03b81 and \u03c41, respectively), and that of the more recently emerging, faster-growing population (\u03b82 and \u03c42), constraining \u03c41\u2009\u03c42 and \u03b81\u2009\u03b82 (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#Fig13\" target=\"_blank\" rel=\"noopener\">8a<\/a>). We assumed that the initial cancer population began exponentially growing at \u03c41 as above, but at time t\u2009=\u2009\u03c42, we selected a single cell with a set of fCpG states drawn according to the cancer population and allowed this second population to grow concurrently with a growth rate \u03b82.<\/p>\n<p>The independent-cancer model followed the same scheme as the nested subclonal selection model, except the methylation status of the emerging cancer was that of an independent cell that experienced random fluctuations between t\u2009=\u20090 and t\u2009=\u2009\u03c42.<\/p>\n<p>If we let the number of cells in the less fit subclone in each methylation state be {m1, k1, w1} and in the fitter subclone be {m2, k2, w2}, following the convention above, then in both cases the measured methylation patterns at the time of sample are:<\/p>\n<p>$${\\beta }_{c}(T)=\\frac{{k}_{1}(T)+2{m}_{1}(T)+{k}_{2}(T)+2{m}_{2}(T)}{2{N}_{e}(T)}$$<\/p>\n<p>Where \\({N}_{e}(T)={e}^{{\\theta }_{1}(T-{\\tau }_{1})}+{e}^{{\\theta }_{2}(T-{\\tau }_{2})}\\).<\/p>\n<p>Adaption of simulations to a\u00a0longitudinal setting<\/p>\n<p>We modified the simulations of how the fCpG methylation distribution changes over time to allow for multiple sequential sample collections. These simulations allow for neutral, independent clones, a single subclonal expansion or two subclonal expansions, which can either be nested or emerge from the clonal trunk in parallel. This required pre-specification of sampling times, along with the emergence times of any subclones or independent clones, which we collected to form a set of \u2018landmark times\u2019. The discrete time steps of the simulation were split into phases between the landmark times, which evolved according to the discrete time Markov process outlined above. At each sampling time, the fCpG methylation fraction was calculated as above and stored as a column in the output matrix.<\/p>\n<p>Prior functions<\/p>\n<p>For each methylation array blood sample, we had matched age (T) and purity (\u03c1) information. Hence, the parameters to be inferred are the growth rate (\u03b8), the age of the patient when the MRCA emerged (\u03c4), the epigenetic switching rates (\u03bc, \u03bd, \u03b3, \u03b6), the average fraction methylated of contaminating normal cells (\u03b2n), the \u03b2-offsets from 0 and 1 due to the background noise on the methylation array (\u03b4 and \u03b5, respectively) and the precision of the \u03b2-distributed noise (\u03ba).<\/p>\n<p>These parameters are constrained either to be positive (\u03b8, \u03bc, \u03bd, \u03b3, \u03b6, \u03ba\u2009&gt; 0) or to lie within a specified range (0\u2009\u03c4\/T, \u03b4, \u03b5\u2009\u03bd and \u03b6 were normalized by \u03bc and \u03b3, respectively.<\/p>\n<p>The priors are as follows:<\/p>\n<p>$$\\theta  \\sim {\\rm{lognormal}}(\\mathrm{3,2})$$<\/p>\n<p>$$\\frac{\\tau }{T} \\sim {\\rm{beta}}(2,2)$$<\/p>\n<p>$$\\mu  \\sim {\\rm{halfnormal}}(0,0.05)$$<\/p>\n<p>$$\\gamma  \\sim {\\rm{halfnormal}}(0,0.05)$$<\/p>\n<p>$$\\frac{\\upsilon }{\\mu } \\sim {\\rm{lognormal}}(1,0.7)$$<\/p>\n<p>$$\\frac{\\zeta }{\\gamma } \\sim {\\rm{lognormal}}(1,0.7)$$<\/p>\n<p>$${\\beta }_{n} \\sim {\\rm{beta}}(2,2)$$<\/p>\n<p>$$\\delta  \\sim {\\rm{beta}}(5,95)$$<\/p>\n<p>$${\\epsilon } \\sim {\\rm{beta}}(95,5)$$<\/p>\n<p>$$\\kappa  \\sim {\\rm{halfnormal}}(100,30)$$<\/p>\n<p>When fitting non-neutral models of tumour growth, the inference was parameterized in terms of the relative growth of the fitter subclone, \\({\\tilde{\\theta }}_{2}=\\frac{{\\theta }_{2}}{{\\theta }_{1}}\\), and the fraction of the population consisting of the fitter subclone, \\(f=\\frac{{e}^{{\\theta }_{2}(t-{\\tau }_{2})}}{{e}^{{\\theta }_{1}(t-{\\tau }_{1})}+{e}^{{\\theta }_{2}(t-{\\tau }_{2})}}\\). The age at which the second clone emerges is then:<\/p>\n<p>$${\\tau }_{2}=T-\\frac{(T-{\\tau }_{1}){\\theta }_{1}}{{\\theta }_{2}}-\\frac{{\\rm{logit}}(f)}{{\\theta }_{2}}$$<\/p>\n<p>This parameterization induces less correlation in the resulting posterior, which greatly improves the sampling efficiency. The priors on these additional parameters are:<\/p>\n<p>$$\\frac{{\\tau }_{1}}{T} \\sim {\\rm{beta}}(2,2)$$<\/p>\n<p>$${\\widetilde{\\theta }}_{2} \\sim {\\rm{lognormal}}(1,0.7)$$<\/p>\n<p>$$f \\sim {\\rm{beta}}(2,2)$$<\/p>\n<p>All the other priors were the same as in the neutral case.<\/p>\n<p>Bayesian inference<\/p>\n<p>We developed a stochastic estimator of the log-likelihood function at a given set of parameters by simulating the fCpG methylation distribution a large number of times, correcting for the bias inherent with using a finite number of simulations and penalizing the log-likelihood for extreme values of the Ne (see <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">Supplementary Information<\/a> for details).<\/p>\n<p>The standard Bayesian algorithms developed to infer the posterior for a given set of data (for example,\u00a0Markov chain Monte Carlo (MCMC), nested sampling) are typically used when the log-likelihood is analytically tractable and can be calculated exactly. It has been shown that, as long as the stochastic approximation of the log-likelihood is unbiased, MCMC methods can obtain an exact Bayesian inference of the true posterior, as in pseudo-marginal Metropolis\u2013Hastings<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 90\" title=\"Andrieu, C. &amp; Roberts, G. O. The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. &#010;                https:\/\/doi.org\/10.1214\/07-AOS574&#010;                &#010;               (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR90\" id=\"ref-link-section-d420590567e4874\" target=\"_blank\" rel=\"noopener\">90<\/a>.<\/p>\n<p>Here we used a nested sampling approach using the dynesty package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Skilling, J. Nested sampling. In AIP Conference Proceedings 735 (eds Fischer, R. et al.) 395&#x2013;405 (AIP Publishing, 2004).\" href=\"#ref-CR91\" id=\"ref-link-section-d420590567e4881\">91<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Skilling, J. Nested sampling for general Bayesian computation. Bayesian Anal. 1, 833&#x2013;860 (2006).\" href=\"#ref-CR92\" id=\"ref-link-section-d420590567e4881_1\">92<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 93\" title=\"Speagle, J. S. dynesty: A dynamic nested sampling package for estimating Bayesian posteriors and evidences. Mon. Not. R. Astron. Soc. 493, 3132&#x2013;3158 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR93\" id=\"ref-link-section-d420590567e4884\" target=\"_blank\" rel=\"noopener\">93<\/a>. Unlike pseudo-marginal Metropolis\u2013Hastings, nested sampling is able to efficiently explore multimodal posterior landscapes (which can occur under the subclonal and independent cancer models).<\/p>\n<p>Model selection for the mode\u00a0of tumour evolution<\/p>\n<p>We used an expected log pointwise predictive density<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 94\" title=\"Vehtari, A., Gelman, A. &amp; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27, 1413&#x2013;1432 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR94\" id=\"ref-link-section-d420590567e4896\" target=\"_blank\" rel=\"noopener\">94<\/a> approach to compare our competing models of evolution for each sample using the arviz Python package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 95\" title=\"Kumar, R., Carroll, C., Hartikainen, A. &amp; Martin, O. ArviZ a unified library for exploratory analysis of Bayesian models in Python. J. Open Source Softw. 4, 1143 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR95\" id=\"ref-link-section-d420590567e4900\" target=\"_blank\" rel=\"noopener\">95<\/a>, which uses PSIS-LOO-CV to compare the out-of-sample prediction accuracy between models while naturally penalizing more complex models. This required the log-likelihood per data point and the posterior predictive for every point in the posterior. The weights of the respective models were calculated using pseudo-Bayesian model averaging using Akaike-type weighting, stabilized using the Bayesian bootstrap<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 96\" title=\"Yao, Y., Vehtari, A., Simpson, D. &amp; Gelman, A. Using stacking to average Bayesian predictive distributions. Bayesian Anal. 13, 917&#x2013;1007 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR96\" id=\"ref-link-section-d420590567e4904\" target=\"_blank\" rel=\"noopener\">96<\/a>.<\/p>\n<p>CLL and Richter transformation genomic analyses<\/p>\n<p>Previous mutated annotation files from WES<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Knisbacher, B. A. et al. Molecular map of chronic lymphocytic leukemia and its impact on outcome. Nat. Genet. 54, 1664&#x2013;1674 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR46\" id=\"ref-link-section-d420590567e4916\" target=\"_blank\" rel=\"noopener\">46<\/a> and WGS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Nadeu, F. et al. Detection of early seeding of Richter transformation in chronic lymphocytic leukemia. Nat. Med. 28, 1662&#x2013;1671 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR27\" id=\"ref-link-section-d420590567e4920\" target=\"_blank\" rel=\"noopener\">27<\/a> data were used to further validate our distinct EVOFLUx evolutionary modes (that is, neutral, subclonal and independent) and Richter transformation phylogenies.<\/p>\n<p>Subclonal deconvolution of WES and WGS data<\/p>\n<p>To detect subclones in bulk WES and WGS data, we used MOBSTER<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Caravagna, G. et al. Subclonal reconstruction of tumors by using machine learning and population genetics. Nat. Genet. 52, 898&#x2013;907 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR43\" id=\"ref-link-section-d420590567e4932\" target=\"_blank\" rel=\"noopener\">43<\/a>, which fits the VAF spectrum with a mixture model containing a Pareto distribution to account for the neutral tail<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 97\" title=\"Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. &amp; Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238&#x2013;244 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR97\" id=\"ref-link-section-d420590567e4936\" target=\"_blank\" rel=\"noopener\">97<\/a> and a variable number of \u03b2-distributions to account for the clonal and subclonal peaks.<\/p>\n<p>We ran MOBSTER using the default parameters, except using a minimum 5% VAF threshold and lowering the minimum number of mutations to compose a cluster to five in WES samples due to the low number of mutations. We then manually quality controlled all 377 WES samples and 10 WGS, tuning the fitting parameters to better represent the data (for instance, when the clonal peak had been called at a low frequency despite the median tumour purity being 95%).<\/p>\n<p>Phylogenetic inference of longitudinal methylation data<\/p>\n<p>A novel Bayesian phylogenetic method was used to reconstruct the evolutionary relationships and the time to MRCA of longitudinal samples from the same patients. This was carried out in the BEAST (v1.8.4) framework<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 98\" title=\"Drummond, A. J. &amp; Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR98\" id=\"ref-link-section-d420590567e4952\" target=\"_blank\" rel=\"noopener\">98<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 99\" title=\"Drummond, A. J., Suchard, M. A., Xie, D. &amp; Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969&#x2013;1973 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR99\" id=\"ref-link-section-d420590567e4955\" target=\"_blank\" rel=\"noopener\">99<\/a> using custom models implemented in PISCA<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 100\" title=\"Martinez, P. et al. Evolution of Barrett&#x2019;s esophagus through space and time at single-crypt and whole-biopsy levels. Nat. Commun. 9, 794 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR100\" id=\"ref-link-section-d420590567e4959\" target=\"_blank\" rel=\"noopener\">100<\/a> (v1.1; available from <a href=\"https:\/\/github.com\/adamallo\/PISCA\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/adamallo\/PISCA<\/a>).<\/p>\n<p>EVOFLUx provided an estimate of the age of the patient when the MRCA of each bulk sample emerged. To estimate the methylation status of each fCpG at the MRCA of the sample in each of our longitudinal samples, we discretized the fCpGs as described above (see the section \u2018CLL RNA sequencing data\u2019).<\/p>\n<p>We implemented a four-parameter biallelic binary substitution model analogous to the pre-growth EVOFLUx model in PISCA. This plugin contains all the required statistical machinery to use this model for somatic phylogenetic estimation. The biallelic binary substitution model has three relative rate parameters: (1) heterozygous methylation \\(\\tilde{\\upsilon }\\), (2) homozygous demethylation \\(\\tilde{\\gamma }\\), and (3) heterozygous demethylation \\(\\tilde{\\zeta }\\), where homozygous methylation \\(\\tilde{\\mu }\\) was normalized to 1. For all relative transition rate parameters, a log-normal prior with mean of 1 and standard deviation of 0.6 was used, with a half-normal prior with mean of 0 and standard deviation of 0.13 for the molecular clock rate, using a strict clock model for the rate of evolution across the tree. Two demographic tree models, constant population size<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 101\" title=\"Kingman, J. F. C. The coalescent. Stoch. Process Appl. 13, 235&#x2013;248 (1982).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR101\" id=\"ref-link-section-d420590567e5068\" target=\"_blank\" rel=\"noopener\">101<\/a> and exponential growth<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 102\" title=\"Griffiths, R. C. &amp; Tavar&#xE9;, S. Sampling theory for neutral alleles in a varying environment. Phil. Trans. R. Soc. Lond. B 344, 403&#x2013;410 (1994).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR102\" id=\"ref-link-section-d420590567e5073\" target=\"_blank\" rel=\"noopener\">102<\/a>, were compared by marginal likelihood estimation using path-sampling<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 103\" title=\"Baele, G. et al. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol. Biol. Evol. 29, 2157&#x2013;2167 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR103\" id=\"ref-link-section-d420590567e5077\" target=\"_blank\" rel=\"noopener\">103<\/a> and a constant population model was deemed more appropriate.<\/p>\n<p>MCMC chains were run for 100 million generations sampled every 100,000 generations and convergence was assessed using Tracer (v.1.7)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 104\" title=\"Rambaut, A., Drummond, A. J., Xie, D., Baele, G. &amp; Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901&#x2013;904 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR104\" id=\"ref-link-section-d420590567e5084\" target=\"_blank\" rel=\"noopener\">104<\/a>, ensuring effective sample sizes\u00a0(ESS) greater than 500 for all parameters. Maximum clade credibility trees were then made using 10% burn-in and medium node heights. The resulting trees were plotted using ggtree<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 105\" title=\"Yu, G., Smith, D. K., Zhu, H., Guan, Y. &amp; Lam, T. T. Y. ggtree: An r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28&#x2013;36 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR105\" id=\"ref-link-section-d420590567e5088\" target=\"_blank\" rel=\"noopener\">105<\/a>.<\/p>\n<p>Phylogenetic inference of SNVs from WGS data<\/p>\n<p>Each bulk sample is represented by a set of clonal mutations found during the deconvolution of WGS data (see above). Where a mutation was deemed absent in the clonal peak, the reference nucleotide was used. Mutational signature assignment<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 106\" title=\"D&#xED;az-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, 12 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR106\" id=\"ref-link-section-d420590567e5100\" target=\"_blank\" rel=\"noopener\">106<\/a> was used to select mutations in the clock-like SBS1 channel<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 107\" title=\"Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941&#x2013;D947 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR107\" id=\"ref-link-section-d420590567e5104\" target=\"_blank\" rel=\"noopener\">107<\/a>. BEAST (v1.10)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 108\" title=\"Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR108\" id=\"ref-link-section-d420590567e5108\" target=\"_blank\" rel=\"noopener\">108<\/a> was then used with the simple binary substitution model (as SBS1 effectively represents just C-to-T substitutions), a strict clock model, a constant population size prior<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 101\" title=\"Kingman, J. F. C. The coalescent. Stoch. Process Appl. 13, 235&#x2013;248 (1982).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#ref-CR101\" id=\"ref-link-section-d420590567e5112\" target=\"_blank\" rel=\"noopener\">101<\/a> and a flat prior on the age of MRCA (from zero to earliest patient sample), with ancestral state estimation at the root. Chains were run and ESS values assessed as described above. The distances between the ancestral state of the root at each MCMC state and the clock rate were used to calculate the expected evolution distance between the root and the known germline. This was used to inform the length of the branch between germline (at birth) and the MRCA of the samples.<\/p>\n<p>Survival analysis<\/p>\n<p>Clinical analyses were performed in CLL for TTFT and overall survival from the time of sampling. Tumour growth rate (\u03b8), effective population size (Ne) and epigenetic switching rates were analysed as continuous variables in univariate Cox regression models for both TTFT and overall survival. The effect size of HRs for each evolutionary variable were analysed considering different scaling factors. In particular, the growth rate was analysed assuming exponential growth (that is, for \u03b8\u2009=\u20091, the population is e\u2009=\u20092.71 times bigger per year), the Ne was considered per million cells, and the cancer age or time from the MRCA was analysed for each 10 years. Individual switching rate parameters (\u03bc, \u03bd, \u03b3 and \u03b6) were largely uninformative of prognosis and were summarized into a mean epigenetic switching rate, which was scaled by a factor of 100. In addition, growth rate and effective population were analysed as continuous variables in multivariate Cox regression models together with TP53 aberrations (considering mutations and deletions together), IGHV gene mutational status and the age of patients at sampling. Kaplan\u2013Meier curves were generated for low and high growth rates and effective population size within IGHV subtypes using maximally selected log-rank statistic using the maxstats package (v0.7-25). P values from Kaplan\u2013Meier curves were derived using the log-rank statistic. Survival (v3.5-7), survminer (v0.4.9) and ggsurvfit (v0.3.1) packages were used under R (v4.3.1). Plots were generated using ggplot2 (v3.5.2).<\/p>\n<p>Estimating\u00a0the rate of change in lymphocyte counts<\/p>\n<p>Historical records of the absolute number of lymphocytes in blood obtained via haemocytometer\u00a0were collected\u00a0for patients with CLL over the whole disease course (that is, an approximate of the number of malignant CLL cells in blood). In 231 patients with CLL, we could obtain at least 10 sample timepoints (that is, at least 10 medical appointments, median n\u2009=\u200927 and mean n\u2009=\u200934) before the first treatment, allowing us to track the natural history of the disease before treatment intervention for the tumour (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM1\" target=\"_blank\" rel=\"noopener\">10<\/a>). We fitted a linear model to all 231 cases and obtained the slope of the observed log number of lymphocytes (that is, the coefficient of the univariate linear model) and compared it with growth rate estimates derived from EVOFLUx.<\/p>\n<p>Statistical analysis<\/p>\n<p>Statistical tests performed throughout the study were performed as two-sided. Appropriate multiple test correction, such as the Holm\u2013Sidak correction, is noted when applied.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09374-4#MOESM2\" target=\"_blank\" rel=\"noopener\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"Assembly and quality control of DNA methylation data We assembled and processed with a harmonized pipeline14 (v4.1; see&hellip;\n","protected":false},"author":2,"featured_media":413883,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[142206,105,3965,142207,29323,3966,70,16,15],"class_list":{"0":"post-413882","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-health","8":"tag-genetic-interaction","9":"tag-health","10":"tag-humanities-and-social-sciences","11":"tag-hydrolases","12":"tag-metabolic-disorders","13":"tag-multidisciplinary","14":"tag-science","15":"tag-uk","16":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115181602326252539","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/413882","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=413882"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/413882\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/413883"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=413882"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=413882"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=413882"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}