{"id":342267,"date":"2025-08-13T22:25:20","date_gmt":"2025-08-13T22:25:20","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/342267\/"},"modified":"2025-08-13T22:25:20","modified_gmt":"2025-08-13T22:25:20","slug":"clone-copy-number-diversity-is-linked-to-survival-in-lung-cancer","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/342267\/","title":{"rendered":"Clone copy number diversity is linked to survival in lung cancer"},"content":{"rendered":"<p>The ALPACA algorithm<\/p>\n<p>We introduce ALPACA, an algorithm to infer allele-specific copy-number profiles of every tumour clone, given three inputs that can be inferred by existing methods: estimates of allele-specific fractional copy number from multiple bulk tumour samples; a fixed number of clones and the proportion of each clone in each tumour sample; and a fixed topology of the tumour phylogenetic tree representing the phylogenetic relationships between all clones. We provide an overview of the model underlying ALPACA, the formulation of the problem that ALPACA aims to solve, and the ALPACA algorithms used to solve it. Additional details can be found in the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#MOESM1\" target=\"_blank\" rel=\"noopener\">Supplementary Information<\/a>.<\/p>\n<p>Copy-number evolutionary model<\/p>\n<p>We assume that a tumour is composed of \\(n\\) distinct tumour clones with their genome partitioned into \\(m\\) copy-number segments owing to the effect of the SCNAs that have been accumulated through tumour evolution. As such, each tumour clone \\(i\\in \\{1,\\,&#8230;,{n}\\}\\) in each genomic segment \\(s\\in \\{1,\\,&#8230;,{m}\\}\\) has a number \\({c}_{s,i}^{{\\rm{A}}}\\) of copies for allele A, and \\({c}_{s,i}^{{\\rm{B}}}\\) for allele B. For a normal diploid clone \\(i=0\\), we know that \\({c}_{s,0}^{{\\rm{A}}}\\,=\\,{c}_{s,0}^{{\\rm{B}}}\\,=\\,1\\). Instead, for any tumour clone \\(i\\in \\{1,\\,&#8230;,{n}\\}\\), the copy numbers \\({c}_{s,i}^{{\\rm{A}}}\\) and \\({c}_{s,i}^{{\\rm{B}}}\\) are unknown, and ALPACA aims to infer them.<\/p>\n<p>We model the evolution of the \\(n\\) tumour clones with a phylogenetic tree \\(T=(V,{E})\\), in which \\(V\\) represents the set of nodes and \\(E\\) represents the related set of edges, which also defines the topology of \\(T\\). Specifically, each clone \\(i\\) is represented by a vertex \\({v}_{i}\\) of the tree; that is, \\({v}_{i}\\in V\\). If tumour clone \\(i\\) is a parent of clone \\(j\\), then \\(({v}_{i},{v}_{j})\\) is an edge of \\(T\\); that is, \\(({v}_{i},{v}_{j})\\in E\\). We assume, without loss of generality, that the MRCA of the tumour clones is clone \\(i=1\\). We also assume that the normal clone \\(i=0\\) is the root vertex of the tree \\({v}_{0}\\in V\\), such that \\(({v}_{0},{v}_{1})\\in E\\).<\/p>\n<p>As we model SCNA evolution of the \\(n\\) tumour clones, each node of the tree \\(T\\) is labelled by related copy numbers for all segments. Specifically, for each tumour clone \\(i\\in \\{1,\\,&#8230;,{n}\\}\\), the corresponding vertex \\({v}_{i}\\in V\\) is labelled in each segment \\(s\\in \\{1,\\,&#8230;,{m}\\}\\) by the copy number \\({c}_{s,i}^{{\\rm{A}}}\\in {\\mathbb{N}}\\) for allele A and \\({c}_{s,i}^{{\\rm{B}}}\\in {\\mathbb{N}}\\) for allele B. The normal root clone at vertex \\({v}_{0}\\) is labelled by \\({c}_{s,0}^{{\\rm{A}}}=\\,1\\) for allele A and \\({c}_{s,0}^{{\\rm{B}}}=1\\) for allele B. Each edge of the tree \\(({v}_{i},{v}_{j})\\in E\\) is thus labelled with events that are required to transform the copy numbers of clone \\(i\\) to those of clone \\(j\\). In particular, similarly to previous copy-number evolutionary models<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994&#x2013;1007 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR14\" id=\"ref-link-section-d54970559e3358\" target=\"_blank\" rel=\"noopener\">14<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Ha, G. et al. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 24, 1881&#x2013;1893 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR15\" id=\"ref-link-section-d54970559e3361\" target=\"_blank\" rel=\"noopener\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Cun, Y., Yang, T.-P., Achter, V., Lang, U. &amp; Peifer, M. Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust. Nat. Protoc. 13, 1488&#x2013;1501 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR17\" id=\"ref-link-section-d54970559e3364\" target=\"_blank\" rel=\"noopener\">17<\/a>, we model each copy-number event for each segment independently as an event that either increases or decreases a copy number by one unit. As such, the label of each edge corresponds to \\({| {c}_{s,i}^{{\\rm{A}}}-{c}_{s,j}^{{\\rm{A}}}| }_{1}\\,+\\) \\({| {c}_{s,i}^{{\\rm{B}}}-{c}_{s,j}^{{\\rm{B}}}| }_{1}\\) events for each genomic segment \\(s\\), as \\({| {c}_{s,i}^{{\\rm{A}}}-{c}_{s,j}^{{\\rm{A}}}| }_{1}\\) events are required for allele A of each segment \\(s\\) and \\({| {c}_{s,i}^{{\\rm{B}}}-{c}_{s,j}^{{\\rm{B}}}| }_{1}\\) for allele B.<\/p>\n<p>Given a fixed topology for a tree \\(T\\), an arbitrarily high number of copy-number labellings for \\(T\\) can be defined in general to explain SCNA evolution of the \\(n\\) tumour clones. Therefore, we introduce an evolutionary model based on three biologically realistic assumptions that define the best copy-number-labelled tree \\(T\\) to represent SCNA evolution. First, similarly to previous models of SCNA evolution<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biol. 23, 241 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR4\" id=\"ref-link-section-d54970559e3815\" target=\"_blank\" rel=\"noopener\">4<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Zaccaria, S., El-Kebir, M., Klau, G. W. &amp; Raphael, B. J. Phylogenetic copy-number factorization of multiple tumor samples. J. Comput. Biol. 25, 689&#x2013;708 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR31\" id=\"ref-link-section-d54970559e3818\" target=\"_blank\" rel=\"noopener\">31<\/a>, we assume that LOH events are irreversible; that is, if all of the copies of one allele in any segment are lost, these cannot be regained by any descendant. Specifically, for any segment \\(s\\in \\{1,\\,&#8230;,{m}\\}\\) in any edge \\(({v}_{i},{v}_{j})\\in E\\) it holds that \\({c}_{s,i}^{{\\rm{A}}}=0\\,\\Rightarrow {c}_{s,j}^{{\\rm{A}}}=\\,0\\) and \\({c}_{s,i}^{{\\rm{B}}}=0\\,\\Rightarrow {c}_{s,j}^{{\\rm{B}}}=\\,0\\). Second, we limit the changes between gains and losses of the same genomic segment that occur along each evolutionary path to prevent overfitting (by default, the upper bound is 2; that is, only one type of change (negative change (loss) or positive change (gain)) can happen multiple times on a single path), whereby a path refers to the sequence of edges \\(\\{({v}_{0},{v}_{1}),\\,&#8230;,({v}_{i-1},{v}_{i})\\}\\) connecting the diploid root clone \\({v}_{0}\\) to any node \\({v}_{i}\\). Last, under the assumption of parsimony, we define the tree with the least number of copy-number events that explain the input data to be the more likely explanation of SCNA evolution, similarly to previous approaches<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biol. 23, 241 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR4\" id=\"ref-link-section-d54970559e4257\" target=\"_blank\" rel=\"noopener\">4<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Zaccaria, S., El-Kebir, M., Klau, G. W. &amp; Raphael, B. J. Phylogenetic copy-number factorization of multiple tumor samples. J. Comput. Biol. 25, 689&#x2013;708 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR31\" id=\"ref-link-section-d54970559e4260\" target=\"_blank\" rel=\"noopener\">31<\/a>.<\/p>\n<p>Model of the sequencing experiment<\/p>\n<p>Each bulk tumour sample comprises an unknown mixture of the \\(n\\) tumour clones. When sequencing \\(k\\) bulk tumour samples, we thus represent the proportion of each clone \\(i\\in \\{0,\\,&#8230;,{n}\\}\\) in each tumour sample \\(r\\in \\{1,\\,&#8230;,{k}\\}\\) as \\({u}_{i,r}\\in [0,\\,1]\\). Specifically, \\({u}_{i,r} &gt; 0\\) indicates that clone \\(i\\) is present in sample \\(r\\), and absent otherwise. We assume that, within each tumour sample \\(r\\), the tumour purity (that is, the proportion of all tumour clones) is \\({\\sum }_{i=1}^{n}{u}_{i,r}=1-{u}_{0,r}\\), in which \\({u}_{0,r}\\) represents the proportion of normal cells in sample \\(r\\).<\/p>\n<p>We cannot directly observe the copy numbers \\({c}_{s,i}^{{\\rm{A}}}\\) and \\({c}_{s,i}^{{\\rm{B}}}\\) of each tumour clone \\(i\\) in each tumour sample from bulk DNA-sequencing data. However, existing methods<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910&#x2013;16915 (2010).\" href=\"#ref-CR28\" id=\"ref-link-section-d54970559e4764\">28<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64&#x2013;70 (2015).\" href=\"#ref-CR29\" id=\"ref-link-section-d54970559e4764_1\">29<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Watkins, T. B. K. et al. Refphase: multi-sample phasing reveals haplotype-specific copy number heterogeneity. PLoS Comput. Biol. 19, e1011379 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR30\" id=\"ref-link-section-d54970559e4767\" target=\"_blank\" rel=\"noopener\">30<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Shen, R. &amp; Seshan, V. E. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 44, e131 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR45\" id=\"ref-link-section-d54970559e4770\" target=\"_blank\" rel=\"noopener\">45<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413&#x2013;421 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR46\" id=\"ref-link-section-d54970559e4773\" target=\"_blank\" rel=\"noopener\">46<\/a> can provide estimates of the allele-specific, fractional copy numbers \\({\\widehat{f}}_{s,r}^{{\\rm{A}}}\\), \\({\\widehat{f}}_{s,r}^{{\\rm{B}}}\\) of each segment \\(s\\) in each tumour sample \\(r\\), such that \\({\\widehat{f}}_{s,r}^{{\\rm{A}}},{\\widehat{f}}_{s,r}^{{\\rm{B}}}\\in {{\\mathbb{R}}}^{\\ge 0}\\) are real-valued observed estimates of the average copy number of all cancer cells in the sample \\({f}_{s,r}^{{\\rm{A}}}=\\,{\\sum }_{i=1}^{n}{c}_{s,i}^{{\\rm{A}}}{u}_{i,r}\\) and \\({f}_{s,r}^{{\\rm{B}}}=\\,{\\sum }_{i=1}^{n}{c}_{s,i}^{{\\rm{B}}}{u}_{i,r}\\), respectively. Using existing methods, we can also observe estimates of the upper bound, \\({f}_{s,r}^{{\\rm{A}},+},\\,{f}_{s,r}^{{\\rm{B}},+}\\in {{\\mathbb{R}}}^{\\ge 0}\\), and lower bound, \\({f}_{s,r}^{{\\rm{A}},-},\\,{f}_{s,r}^{{\\rm{B}},-}\\in {{\\mathbb{R}}}^{\\ge 0}\\), of \\({f}_{s,r}^{{\\rm{A}}}\\) and \\({f}_{s,r}^{{\\rm{B}}}\\), for each sample \\(r\\) and genomic segment \\(s\\). The fractional copy number estimates, \\({\\widehat{f}}_{s,r}^{{\\rm{A}}}\\), \\({\\widehat{f}}_{s,r}^{{\\rm{B}}}\\), and their associated upper and lower bound estimates \\([{f}_{s,r}^{{\\rm{A}},-},{f}_{s,r}^{{\\rm{A}},+}]\\) and \\([{f}_{s,r}^{{\\rm{B}},-},{f}_{s,r}^{{\\rm{B}},+}]\\) represent measurements obtained from standard DNA-sequencing experiments.<\/p>\n<p>Problem formulation<\/p>\n<p>Hence, given allele-specific estimates of fractional copy numbers from multiple bulk DNA-sequenced tumour samples, ALPACA aims to infer the integer clone copy-number values \\({c}_{s,i}^{{\\rm{A}}}\\) and \\({c}_{s,i}^{{\\rm{B}}}\\) that minimize the objective functions above, similarly to the clone copy-number mixture deconvolution problem explored in previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\" title=\"Zaccaria, S. &amp; Raphael, B. J. Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data. Nat. Commun. 11, 4301 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR2\" id=\"ref-link-section-d54970559e5880\" target=\"_blank\" rel=\"noopener\">2<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Myers, M. A. et al. HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data. Genome Biol. 25, 130 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR3\" id=\"ref-link-section-d54970559e5883\" target=\"_blank\" rel=\"noopener\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Ha, G. et al. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 24, 1881&#x2013;1893 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR15\" id=\"ref-link-section-d54970559e5886\" target=\"_blank\" rel=\"noopener\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Oesper, L., Mahmoody, A. &amp; Raphael, B. J. THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 14, R80 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR16\" id=\"ref-link-section-d54970559e5889\" target=\"_blank\" rel=\"noopener\">16<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"McPherson, A. W. et al. ReMixT: clone-specific genomic structure estimation in cancer. Genome Biol. 18, 140 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR19\" id=\"ref-link-section-d54970559e5892\" target=\"_blank\" rel=\"noopener\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Zaccaria, S., El-Kebir, M., Klau, G. W. &amp; Raphael, B. J. Phylogenetic copy-number factorization of multiple tumor samples. J. Comput. Biol. 25, 689&#x2013;708 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR31\" id=\"ref-link-section-d54970559e5895\" target=\"_blank\" rel=\"noopener\">31<\/a>. In contrast to existing methods, ALPACA does not however aim to jointly infer the number of clones present \\(n\\), and their clone proportions \\({u}_{i,r}\\). Instead, as described above, the hypothesis underlying ALPACA is that copy-number inference from fractional copy numbers can be improved by integrating other data that can be inferred by existing methods, in addition to the estimated fractional copy numbers: the topology of the tree \\(T\\) defined by the related edges \\(E\\), and the proportion \\({u}_{i,r}\\) of each clone \\(i\\) in each sample \\(r\\). Using fixed clone proportions is expected to improve the performance of copy-number inference because it simplifies the complex copy-number deconvolution problem into an easier linear optimization problem.<\/p>\n<p>In fact, the tree, \\(T\\), that is taken as input to ALPACA can be derived from standard approaches for phylogenetic reconstruction from SNVs<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"#ref-CR21\" id=\"ref-link-section-d54970559e6069\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"El-Kebir, M., Satas, G., Oesper, L. &amp; Raphael, B. J. Inferring the mutational history of a tumor Ising multi-state perfect phylogeny mixtures. Cell Syst. 3, 43&#x2013;53 (2016).\" href=\"#ref-CR22\" id=\"ref-link-section-d54970559e6069_1\">22<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Popic, V. et al. Fast and scalable inference of multi-sample cancer lineages. Genome Biol. 16, 91 (2015).\" href=\"#ref-CR23\" id=\"ref-link-section-d54970559e6069_2\">23<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Malikic, S., McPherson, A. W., Donmez, N. &amp; Sahinalp, C. S. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics 31, 1349&#x2013;1356 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR24\" id=\"ref-link-section-d54970559e6072\" target=\"_blank\" rel=\"noopener\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Wintersinger, J. A. et al. Reconstructing complex cancer evolutionary histories from multiple bulk DNA samples using pairtree. Blood Cancer Discov. 3, 208&#x2013;219 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR26\" id=\"ref-link-section-d54970559e6075\" target=\"_blank\" rel=\"noopener\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Myers, M. A., Satas, G. &amp; Raphael, B. J. CALDER: inferring phylogenetic trees from longitudinal tumor samples. Cell Syst. 8, 514&#x2013;522 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR27\" id=\"ref-link-section-d54970559e6078\" target=\"_blank\" rel=\"noopener\">27<\/a>. Similarly, clone proportions of the tumour clones \\({u}_{1,r},\\,&#8230;,\\,{u}_{n,r}\\) in sample \\(r\\) can be derived from standard approaches for tumour clonal reconstruction using SNVs<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"#ref-CR21\" id=\"ref-link-section-d54970559e6166\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"El-Kebir, M., Satas, G., Oesper, L. &amp; Raphael, B. J. Inferring the mutational history of a tumor Ising multi-state perfect phylogeny mixtures. Cell Syst. 3, 43&#x2013;53 (2016).\" href=\"#ref-CR22\" id=\"ref-link-section-d54970559e6166_1\">22<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Popic, V. et al. Fast and scalable inference of multi-sample cancer lineages. Genome Biol. 16, 91 (2015).\" href=\"#ref-CR23\" id=\"ref-link-section-d54970559e6166_2\">23<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Malikic, S., McPherson, A. W., Donmez, N. &amp; Sahinalp, C. S. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics 31, 1349&#x2013;1356 (2015).\" href=\"#ref-CR24\" id=\"ref-link-section-d54970559e6166_3\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Jiang, Y., Qiu, Y., Minn, A. J. &amp; Zhang, N. R. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl Acad. Sci. USA 113, E5528&#x2013;E5537 (2016).\" href=\"#ref-CR25\" id=\"ref-link-section-d54970559e6166_4\">25<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Wintersinger, J. A. et al. Reconstructing complex cancer evolutionary histories from multiple bulk DNA samples using pairtree. Blood Cancer Discov. 3, 208&#x2013;219 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR26\" id=\"ref-link-section-d54970559e6169\" target=\"_blank\" rel=\"noopener\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396&#x2013;398 (2014).\" href=\"#ref-CR47\" id=\"ref-link-section-d54970559e6172\">47<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Gillis, S. &amp; Roth, A. PyClone-VI: scalable inference of clonal population structures using whole genome data. BMC Bioinform. 21, 571 (2020).\" href=\"#ref-CR48\" id=\"ref-link-section-d54970559e6172_1\">48<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Satas, G., Zaccaria, S., El-Kebir, M. &amp; Raphael, B. J. DeCiFering the elusive cancer cell fraction in tumor heterogeneity and evolution. Cell Syst 12, 1004&#x2013;1018 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR49\" id=\"ref-link-section-d54970559e6175\" target=\"_blank\" rel=\"noopener\">49<\/a>, and we assume that the tumour purity of each sample \\(r\\) (that is, \\((1-{u}_{0,r})\\)) can be estimated from standard approaches for tumour deconvolution from bulk DNA sequencing (for example, ASCAT<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910&#x2013;16915 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR28\" id=\"ref-link-section-d54970559e6240\" target=\"_blank\" rel=\"noopener\">28<\/a> and Sequenza<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64&#x2013;70 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR29\" id=\"ref-link-section-d54970559e6244\" target=\"_blank\" rel=\"noopener\">29<\/a>).<\/p>\n<p>Overall, ALPACA thus aims to infer the unknown copy numbers of each clone in a tree \\(T\\) with a given topology (that is, given the set of edges \\(E\\)) and given the fractional copy numbers in each sequenced tumour sample. Formally, ALPACA is framed as a linear optimization algorithm, aiming to find the copy numbers \\({c}_{s,i}^{{\\rm{A}}}\\) and \\({c}_{s,i}^{{\\rm{B}}}\\) for all clones \\(i\\in \\{1,\\,&#8230;,{n}\\}\\) in all segments \\(s\\in \\{1,\\,&#8230;,{m}\\}\\) that satisfy the following model objectives:<\/p>\n<ol class=\"u-list-style-none\">\n<li>\n                    1.<\/p>\n<p>Given the observed, allele-specific fractional copy number estimates in each sample \\({\\widehat{f}}_{s,r}^{{\\rm{A}}},{\\widehat{f}}_{s,r}^{{\\rm{B}}}\\), their upper and lower bound estimates \\([\\,{f}_{s,r}^{{\\rm{A}},-},{f}_{s,r}^{{\\rm{A}},+}\\,]\\) and \\([\\,{f}_{s,r}^{{\\rm{B}},-},{f}_{s,r}^{{\\rm{B}},+}\\,]\\), the clone proportions in each sample \\({u}_{i,r}\\), and constraints imposed by the phylogenetic tree \\(T\\), find \\({c}_{s,i}^{{\\rm{A}}},{c}_{s,i}^{{\\rm{B}}}\\in {\\mathbb{N}}\\) that minimize the number of samples for which \\({\\sum }_{i=1}^{n}{c}_{s,i}^{{\\rm{A}}}{u}_{i,r}\\,\\notin \\) \\([\\,{f}_{s,r}^{{\\rm{A}},-}\\,,{f}_{s,r}^{{\\rm{A}},+}\\,]\\) or \\({\\sum }_{i=1}^{n}{c}_{s,i}^{{\\rm{B}}}{u}_{i,r}\\notin [\\,{f}_{s,r}^{{\\rm{B}},-},{f}_{s,r}^{{\\rm{B}},+}\\,]\\).<\/p>\n<\/li>\n<li>\n                    2.<\/p>\n<p>Among these solutions, find \\({c}_{s,i}^{{\\rm{A}}}\\) and \\({c}_{s,i}^{{\\rm{B}}}\\) that minimize the absolute difference (Manhattan distance) between the predicted and expected fractional copy number, \\({| {\\widehat{f}}_{s,r}^{{\\rm{A}}}-{\\sum }_{i=1}^{n}{c}_{s,i}^{{\\rm{A}}}{u}_{i,r}| }_{1}\\) and \\({| {\\widehat{f}}_{s,r}^{{\\rm{A}}}-{\\sum }_{i=1}^{n}{c}_{s,i}^{{\\rm{B}}}{u}_{i,r}| }_{1}\\).<\/p>\n<\/li>\n<\/ol>\n<p>ALPACA solves this optimization problem using an algorithm based on a mixed-integer linear program (MILP implemented with gurobipy, v11.0.1) and a model selection approach defined in the following section and in the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#MOESM1\" target=\"_blank\" rel=\"noopener\">Supplementary Information<\/a>.<\/p>\n<p>Inferring clone-specific copy numbers<\/p>\n<p>The ALPACA algorithm solves the problem stated above using a MILP and a model selection approach<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\" title=\"Zaccaria, S. &amp; Raphael, B. J. Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data. Nat. Commun. 11, 4301 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR2\" id=\"ref-link-section-d54970559e7553\" target=\"_blank\" rel=\"noopener\">2<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Myers, M. A. et al. HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data. Genome Biol. 25, 130 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR3\" id=\"ref-link-section-d54970559e7556\" target=\"_blank\" rel=\"noopener\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Ha, G. et al. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 24, 1881&#x2013;1893 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR15\" id=\"ref-link-section-d54970559e7559\" target=\"_blank\" rel=\"noopener\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Zaccaria, S., El-Kebir, M., Klau, G. W. &amp; Raphael, B. J. Phylogenetic copy-number factorization of multiple tumor samples. J. Comput. Biol. 25, 689&#x2013;708 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR31\" id=\"ref-link-section-d54970559e7562\" target=\"_blank\" rel=\"noopener\">31<\/a>. Specifically, given a fixed number of copy-number events \\({\\lambda }_{s}\\in {\\mathbb{N}}\\), ALPACA identifies the copy numbers that minimize the above model objectives 1 and 2 under the previously described constraints, using an MILP formulation (details are reported in\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#MOESM1\" target=\"_blank\" rel=\"noopener\">Supplementary Information<\/a>). That is, for each segment \\(s\\), we aim to find \\({c}_{s,i}^{{\\rm{A}}}\\) and \\({c}_{s,i}^{{\\rm{B}}}\\) that minimize the objective functions subject to constraints described above, such that the total number of copy-number events across all edges \\(({v}_{i},{v}_{j})\\) of the tree \\(T\\) (including both gains and losses), and both alleles is less than \\({\\lambda }_{s}\\). ALPACA performs this optimization for each segment independently, as the evolution of each segment is modelled independently for the fixed tree topology. This allows ALPACA to scale to tumours with a high burden of subclonal SCNAs and thus a high number of segments at once.<\/p>\n<p>The number of copy-number events \\({\\lambda }_{s}\\) that occur throughout evolution is unknown. Therefore, similarly to previous studies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biol. 23, 241 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR4\" id=\"ref-link-section-d54970559e7825\" target=\"_blank\" rel=\"noopener\">4<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Zaccaria, S., El-Kebir, M., Klau, G. W. &amp; Raphael, B. J. Phylogenetic copy-number factorization of multiple tumor samples. J. Comput. Biol. 25, 689&#x2013;708 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR31\" id=\"ref-link-section-d54970559e7828\" target=\"_blank\" rel=\"noopener\">31<\/a>, we apply a model selection approach to find \\({\\lambda }_{s}\\) while finding the best compromise between solutions with higher complexity (higher values of \\({\\lambda }_{s}\\)) and those with better fit of the data (lower values of the objective function). Specifically, we consider an increasing number of copy-number events \\({\\lambda }_{s}\\), and we select the best number of events, \\(\\hat{{\\lambda }_{s}}\\), using the elbow approach implemented by the Kneedle algorithm (kneed v0.8.5)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Satopaa, V., Albrecht, J., Irwin, D. &amp; Raghavan, B. Finding a &#x2018;kneedle&#x2019; in a haystack: detecting knee points in system behavior. In Proc. 2011 31st International Conference on Distributed Computing Systems Workshops 166&#x2013;171 (IEEE, 2011); &#010;                https:\/\/doi.org\/10.1109\/ICDCSW.2011.20&#010;                &#010;              .\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR50\" id=\"ref-link-section-d54970559e7953\" target=\"_blank\" rel=\"noopener\">50<\/a>. Further details on the model selection algorithm used by ALPACA are available in the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#MOESM1\" target=\"_blank\" rel=\"noopener\">Supplementary Information<\/a>.<\/p>\n<p>The ALPACA algorithm outputs inferred clone copy numbers \\({c}_{s,i}^{{\\rm{A}}}\\) and \\({c}_{s,i}^{{\\rm{B}}}\\) for each segment \\(s\\in \\{1,\\,&#8230;,{m}\\}\\) within every ancestral and observed tumour clone in the \\(k\\) sequenced samples.<\/p>\n<p>Run time<\/p>\n<p>In the experimental TRACERx421 primary cohort, the total CPU time for a single tumour ranged from 48\u2009s to 36\u2009h with a median run time of 19\u2009min. The differences in run time across tumours can be partially explained by the number of segments (ALPACA iterates over each segment independently) and by the number of SNV clones detected in a tumour; mean purity and ploidy of tumours show only a very weak correlation with the CPU run time (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig8\" target=\"_blank\" rel=\"noopener\">2<\/a>). Iterating over genomic segments independently allows for a very efficient parallelization, in which segments predicted to complete quickly can be grouped into large batches, and segments predicted to complete slowly can be grouped into small ones.<\/p>\n<p>Benchmarking ALPACA on simulated dataMASCoTE simulations<br \/>\n                  Creating input<\/p>\n<p>We used the previously published simulation framework MASCoTE<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\" title=\"Zaccaria, S. &amp; Raphael, B. J. Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data. Nat. Commun. 11, 4301 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR2\" id=\"ref-link-section-d54970559e8129\" target=\"_blank\" rel=\"noopener\">2<\/a> to simulate multi-sample bulk sequencing, using the following steps.<\/p>\n<p>First, to create the input required by ALPACA, we mapped the 50-kilobase binned simulated data to the ground-truth segments, represented by \\(S\\). Second, we calculated the average read depth \\(R\\) of each segment \\(s\\in \\{1,\\,&#8230;,{|S|}\\}\\) and each sample \\(r\\in \\{1,\\,&#8230;,{k}\\}\\), \\({R}_{s,{r}}\\), by averaging \\(R\\) across all bins \\(b\\in B\\) overlapping with a segment \\(s\\):<\/p>\n<p>\\({R}_{s,{r}}=\\frac{1}{|{B}_{s}|}\\sum _{b\\in {B}_{s}}{R}_{b}\\), in which \\(|{B}_{s}|\\) is the number of bins overlapping with segment \\(s\\), and \\({R}_{b}\\) is the read depth of bin \\(b\\). Using the same method, we then calculated the BAF \\(B\\) for each segment \\(s\\) and each sample \\(r\\), using the following equation:<\/p>\n<p>$${B}_{s,r}=\\frac{1}{|{B}_{s}|}\\sum _{b\\in {B}_{s}}{B}_{b}$$<\/p>\n<p>For each segment \\(s\\) and each sample \\(r\\), we then determined the single nucleotide polymorphism (SNP) confidence intervals by fitting a t-distribution to the read depth values within each segment and sample. To simulate the experimental noise, we first calculated the true fractional copy numbers \\({F}_{T}\\) for each sample \\(r\\) and segment \\(s\\) by taking the dot product of two vectors: the vector representing the clone proportions in the sample \\({{\\bf{U}}}_{{\\bf{r}}}\\) and the vector of integer copy numbers of the clones present in the sample, \\({\\bf{C}}\\):<\/p>\n<p>$${F}_{T}={{\\bf{U}}}_{{\\bf{r}}}\\cdot {\\bf{C}}$$<\/p>\n<p>Using the true fractional copy numbers \\({F}_{T,r}\\) and read depth \\({R}_{r}\\) for each sample \\(r\\), we derived a scaling factor for each segment and each sample, \\({\\varGamma }_{S,R}\\), by dividing \\({F}_{T,r}\\) by the \\({R}_{r}\\) values for each segment \\(s\\) and each sample \\(r\\). By taking a mean of \\({\\varGamma }_{s,r}\\) across all segments for each sample, we obtained the mean scaling factor per sample \\(\\bar{{\\varGamma }_{r}}\\):<\/p>\n<p>$${\\varGamma }_{s,r}=\\frac{{F}_{T,s,r}}{{R}_{s,r}}$$<\/p>\n<p>$$\\bar{{\\varGamma }_{r}}=\\frac{1}{|S|}\\sum _{s\\in S}{\\varGamma }_{s,r}$$<\/p>\n<p>By multiplying the mean scaling factor per sample, \\({\\varGamma }_{r}\\), by \\(R\\) of each segment in each sample, we obtained the final total fractional copy numbers, \\(F\\), for each segment in each sample:<\/p>\n<p>$${F}_{s,r}=\\bar{{\\varGamma }_{r}}\\times {R}_{s,r}$$<\/p>\n<p>Last, using previously calculated \\(B\\), we obtained fractional copy numbers for alleles A and B (\\({F}^{{\\rm{A}}}\\) and \\({F}^{{\\rm{B}}}\\)) for each segment and each sample:<\/p>\n<p>$${F}_{s,r}^{{\\rm{A}}}={F}_{s,r}\\times ({1-B}_{s,r})$$<\/p>\n<p>$${F}_{s,r}^{{\\rm{B}}}={F}_{s,r}\\times {B}_{s,r}$$<\/p>\n<p>                  Accuracy calculation<\/p>\n<p>To compare ALPACA, HATCHet, HATCHet2 and cloneHD results using MASCoTE simulations, we used the accuracy metric adapted from a similar metric used in the HATCHet2 publication<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Myers, M. A. et al. HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data. Genome Biol. 25, 130 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR3\" id=\"ref-link-section-d54970559e9754\" target=\"_blank\" rel=\"noopener\">3<\/a>. Let \\({S}_{s}\\) signify the set of allele-specific, integer copy-number states inferred by a model for a specific genomic segment \\(s\\). Let \\({T}_{s}\\) signify the set of true integer copy-number states for segment \\(s\\). Let \\({l}_{s}\\) signify the length of segment \\(s\\) (expressed in base pairs), and let \\(L=\\sum _{s}{l}_{s}\\) represent the total length of genomic segments under consideration. The accuracy of a specific model can then be represented as<\/p>\n<p>$${\\rm{Accuracy}}=\\frac{1}{L}{\\Sigma }_{s}{l}_{s}\\frac{| {S}_{s}\\bigcap {T}_{s}| }{| {S}_{s}\\bigcup {T}_{s}| }$$<\/p>\n<p>                Simple model<br \/>\n                  Tx simulations in simple model<\/p>\n<p>To recapitulate the clonal and SCNA landscape of NSCLC, we used the previously described simulated cohort<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e10093\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR21\" id=\"ref-link-section-d54970559e10096\" target=\"_blank\" rel=\"noopener\">21<\/a>, which provided us with phylogenetic trees, copy-number events across the genome and clone proportions in each tumour sample. To obtain the ALPACA input using these data, we calculated the true fractional copy number for each simulated genomic segment and matched it with a real segment from the TRACERx421 primary cohort<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e10100\" target=\"_blank\" rel=\"noopener\">5<\/a> (randomly selected from a pool of segments with similar length and fractional copy number: tolerance \u00b110%). To obtain confidence intervals of these fractional copy-number values per segment, we assigned SNPs of the true segment to the simulated segment, and corrected the copy numbers of the assigned SNPs by the difference between the fractional copy-number states of the simulated segment and the true matched segment.<\/p>\n<p>                  Implementation<\/p>\n<p>The simple model was implemented similarly to the method described previously<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e10112\" target=\"_blank\" rel=\"noopener\">5<\/a>. In brief, for each segment, clone and allele, the clone-specific copy number was computed as a rounded fractional copy number (to the nearest integer) of a sample in which a given clone was most prevalent, on the basis of the difference in cancer cell fraction values between samples in which the event is and is not present. For clones that have no representation in any sample, copy number was inherited from the closest ancestor. If no ancestors were available for a clone, this clone\u2019s copy number was assigned to take the copy-number states of its children.<\/p>\n<p>                Matching comparison between methods<br \/>\n                  Creating input<\/p>\n<p>To evaluate the ability of ALPACA to accurately infer the evolution of SCNAs, we extended the previously created realistic simulated instances that modelled SNV evolution (Tx simulations)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e10129\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR21\" id=\"ref-link-section-d54970559e10132\" target=\"_blank\" rel=\"noopener\">21<\/a>. This simulated dataset contained 2,704 clones and 87,393 genomic segments, amounting to 3,179,926 clone- and allele-specific segments. These simulated instances modelled not only truncal and subclonal SNVs, but also the effect of multiple other genomic alterations such as truncal and subclonal SCNAs and whole-genome duplication on SNV evolution, enabling their use to evaluate the reconstruction of SCNA evolution as well as SNV evolution. Additionally, the simulations were developed to reflect real non-small cell lung cancer phylogenies by using distributions of statistics measured from sequencing data for the TRACERx421 cohort<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e10136\" target=\"_blank\" rel=\"noopener\">5<\/a>, such as the number of samples per simulation, the frequency of SCNAs and the proportion of events occurring truncally or subclonally. We used these simulations to evaluate the performance of phylogeny reconstruction using CONIPHER<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR21\" id=\"ref-link-section-d54970559e10140\" target=\"_blank\" rel=\"noopener\">21<\/a> and ALPACA, and compared the results to the existing state-of-the-art copy-number reconstruction tools HATCHet2 (ref.\u2009<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Myers, M. A. et al. HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data. Genome Biol. 25, 130 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR3\" id=\"ref-link-section-d54970559e10144\" target=\"_blank\" rel=\"noopener\">3<\/a>) and MEDICC2 (ref.\u2009<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biol. 23, 241 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR4\" id=\"ref-link-section-d54970559e10148\" target=\"_blank\" rel=\"noopener\">4<\/a>).<\/p>\n<p>To reconstruct copy-number evolution, each method required sequencing information across the entire genome. However, as the Tx simulation framework was developed to simulate SNV evolution, the instances contained simulated sequencing data for only select regions of the genome (those affected by simulated SNVs). We therefore located the positions across the simulated genome with copy-number information and defined the size of the copy number alterations to cover between 50 and 100\u2009mb, resulting in a simulated genome of 1-mb bins, for which each bin is denoted by \\(i\\). Any bins containing more than one simulated SNV were split into multiple smaller bins. For each sample \\(s\\), the ploidy \\({\\rho }_{s}\\) and purity \\({\\mu }_{s}\\) values from the Tx simulated sequencing data were used.<\/p>\n<p>The sequencing depth of each bin, or the observed total number of reads \\({t}_{i,s}\\), was modelled using a Poisson distribution with the mean equal to the expected total number of reads present in each bin \\(i\\) for all cells in each sample \\(s\\). The expected total number of reads can be calculated from the true fractional copy number \\({f}_{i,s}\\), in which \\({f}_{i,s}\\) corresponds to the average total copy number of each bin \\(i\\) for all cells present in sample \\(s\\). The values of \\({f}_{i,s}\\), as well as the expected allele-specific fractional copy number for allele A \\({x}_{i,s}\\) and allele B \\({y}_{i,s}\\), for bins containing a SCNA were populated using the respective true fractional copy number values calculated in the Tx simulations, whereas the values for any bins without a SCNA were populated with either a diploid or tetraploid copy-number value depending on the most common value in the simulated sequencing data for each sample. The expected total number of reads is equal to \\(\\frac{{f}_{i,s}}{{\\rho }_{s}}{\\gamma }_{s}\\), in which \\({\\gamma }_{s}\\) represents the expected average coverage per sample. \\({\\gamma }_{s}\\) was set to a value of 1,000 for all samples \\(s\\) in all simulated instances. Therefore, we simulated \\({t}_{i,s}\\) as drawn from a Poisson distribution with the mean equal to the expected number of total reads \\({t}_{i,s}\\, \\sim {\\rm{Poisson}}\\left(\\frac{{f}_{i,s}}{{\\rho }_{s}}{\\gamma }_{s}\\right)\\).<\/p>\n<p>We next used a binomial model to simulate the observed B-allele frequency (BAF) of each bin. We defined the true BAF \\({\\beta }_{i,s}\\) for each bin \\(i\\) and sample \\(s\\) on the basis of the fractional copy number of both alleles \\({f}_{i,s}\\) and allele B \\({y}_{i,s}\\) as \\({\\beta }_{i,s}\\,=\\,\\frac{{y}_{i,s}}{{f}_{i,s}}\\). We then modelled the observed BAF \\({\\hat{\\beta }}_{i,s}\\) of each bin using the binomial distribution with the probability of success equal to \\({\\beta }_{i,s}\\) and the number of trials equal to a simulated coverage value \\({\\gamma }_{s}\\) of 1,000, such that \\({\\widehat{\\beta }}_{i,s}\\, \\sim \\,{\\rm{binomial}}({\\gamma }_{s},{\\beta }_{i,s})\\).<\/p>\n<p>The observed fractional copy-number values for each bin \\(i\\) and sample \\(s\\) were then calculated for allele A \\({\\hat{x}}_{i,s}\\) and allele B \\({\\hat{y}}_{i,s}\\) on the basis of the observed total number of reads present in each bin \\({t}_{i,s}\\), the ploidy of the sample \\({\\rho }_{s}\\), and the observed BAF \\({\\hat{\\beta }}_{i,s}\\) using the following equations for allele A and allele B, respectively: \\({\\hat{x}}_{i,s}\\,={t}_{i,s}\\times {\\rho }_{s}\\times (1-{\\hat{\\beta }}_{i,s})\\), \\({\\hat{y}}_{i,s}\\,={t}_{i,s}\\times {\\rho }_{s}\\times {\\hat{\\beta }}_{i,s}\\).<\/p>\n<p>                  Parameters used for HATCHet2 and MEDICC2<\/p>\n<p>Using our set of ground-truth simulations (Tx simulations), we compared the performance of CONIPHER\u2009+\u2009ALPACA to HATCHet2\u2009+\u2009MEDICC2 to accurately infer copy-number evolution. HATCHet2 (v2.0.1) was run with default parameters, except for an increase in the possible number of clones identified. A maximum value of 12 clones was used for simulations from LTXSIM001 to LTXSIM050, and a maximum value of 18 clones was used for simulations from LTXSIM051 to LTXSIM150, owing to the increased number of clones in the ground-truth instances. MEDICC2 (v2.0.4) was run with default parameters. For one simulated case (LTXSIM063), HATCHet2 did not complete within the 72\u2009h time limit, and we excluded this case from the comparison.<\/p>\n<p>                Comparison metrics<br \/>\n                  HD matching clone comparison<\/p>\n<p>To evaluate model performance using HD, we followed the following procedure. For each simulated clone, we calculated the HD between its ground-truth copy-number profile and the copy-number profiles of all clones predicted by the model. This allowed us to identify the most similar predicted clone for each true clone. After assigning a matching predicted clone and its corresponding HD to each true clone, we computed the mean of these distances. This final mean value represents how accurately the model retrieved the true copy-number states.<\/p>\n<p>                  TVD<\/p>\n<p>To evaluate model performance using TVD, we considered the output of each method (that is, copy-number states for each genomic segment and clone proportions associated with each such state) as probability distribution over discrete copy-number states and calculated the TVD between true and inferred distributions for each model. Owing to the high number of genomic segments in Tx simulations, TVD in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig2\" target=\"_blank\" rel=\"noopener\">2c<\/a> is averaged per patient before plotting.<\/p>\n<p>                Single-cell data processing<\/p>\n<p>To evaluate the performance of ALPACA on real tumour data, we harnessed single-cell data acquired from 10x Genomics sequencing of a breast tumour and used CHISEL<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Zaccaria, S. &amp; Raphael, B. J. Characterizing allele- and haplotype-specific copy numbers in single cells with CHISEL. Nat. Biotechnol. 39, 207&#x2013;214 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR35\" id=\"ref-link-section-d54970559e11703\" target=\"_blank\" rel=\"noopener\">35<\/a> to estimate the copy number, reconstruct clonal architecture of the samples, generate ground-truth copy numbers and generate input data required by ALPACA. This dataset contained five samples (A\u2013E). Sample A contained only normal cells and was excluded from this analysis. On the basis of the known number of cells within each clone, we calculated clone proportions in each sample and created a ground-truth reference for all clones excluding the MRCA, for which ground-truth data were not available. Next, we created confidence intervals for each segment and each sample by fitting the normal distribution to each segment\u2019s fractional copy-number state with mean 0 and standard deviation of 0.1 to simulate noise. These data, together with the phylogenetic tree obtained from the original study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Zaccaria, S. &amp; Raphael, B. J. Characterizing allele- and haplotype-specific copy numbers in single cells with CHISEL. Nat. Biotechnol. 39, 207&#x2013;214 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR35\" id=\"ref-link-section-d54970559e11707\" target=\"_blank\" rel=\"noopener\">35<\/a>, were used as input to ALPACA. In this dataset, two clones dominate the tumour (JII and JIV), with the largest other clone corresponding to only approximately 4% of the sampled tumour (average across all samples, Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig9\" target=\"_blank\" rel=\"noopener\">3e<\/a>), and thus probably below the threshold of SNV clone detection using bulk sequencing using standard approaches<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e11714\" target=\"_blank\" rel=\"noopener\">5<\/a>. Therefore, we excluded these small clones from the comparison.<\/p>\n<p>Identifying regions affected by extensive copy-number heterogeneity<\/p>\n<p>To identify regions affected by extensive SCNA events, we calculated the coefficient of variation for each allele\u2019s fractional copy number across all samples within each genomic segment. We then averaged these values to obtain the mean coefficient of variation for each segment. Next, we ranked the genomic segments within each tumour on the basis of their mean coefficient of variation and created four cohorts by progressively filtering out the top 5%, 15%, 25% or 50% of the genome with the highest variation.<\/p>\n<p>Mutation clustering concordance<\/p>\n<p>We calculated clustering concordance scores for each pair of mutations, by checking whether mutations in both sets of results retained the same phylogenetic relationships in the default CONIPHER output compared to a cohort with 5%, 15%, 25% or 50% of the genome with the highest copy number variation removed. We classified each pair of mutations as: belonging to the same cluster; ancestor\u2013descendant; descendant\u2013ancestor; or parallel (not belonging to any other class). We scored each pair as concordant if the class of phylogenetic relationship was found to be the same in the full cohort and filtered cohort. The final score represents the fraction of concordant pairs out of all the possible pairs<\/p>\n<p>Preprocessing real data for input to ALPACACohort<\/p>\n<p>All biological analyses in Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig3\" target=\"_blank\" rel=\"noopener\">3<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig6\" target=\"_blank\" rel=\"noopener\">6<\/a>\u00a0and Extended Data Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig7\" target=\"_blank\" rel=\"noopener\">1d<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig8\" target=\"_blank\" rel=\"noopener\">2<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig11\" target=\"_blank\" rel=\"noopener\">5<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig16\" target=\"_blank\" rel=\"noopener\">10<\/a>) were performed on the processed multi-sample bulk WES data from the recent longitudinal, prospective cohort TRACERx421 (refs.\u2009<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e11767\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534&#x2013;542 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR6\" id=\"ref-link-section-d54970559e11770\" target=\"_blank\" rel=\"noopener\">6<\/a>). For each tumour, the eligibility criteria for input to ALPACA included: allele-specific copy number computed for at least two primary tumour samples; a tumour phylogenetic tree computed from SNVs present across these same tumour samples, comprising at least two tumour clones (that is, at least an MRCA clone and one subclonal tree edge); and computed clone proportions of each clone on the phylogenetic tree in each sample. This resulted in n\u2009=\u2009395 primary tumours<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e11777\" target=\"_blank\" rel=\"noopener\">5<\/a> and n\u2009=\u2009126 paired primary\u2013metastatic tumours with all of the correct inputs required for ALPACA.<\/p>\n<p>Obtaining allele-specific fractional copy numbers per bulk tumour sample<\/p>\n<p>The fractional copy numbers were further processed in the following way. We first identified clonal segments, by performing a statistical test to identify whether the fractional, allele-specific copy numbers were significantly different from their closest integer (t-test, P-value threshold of 0.0001). Segments with values significantly different from the nearest integers were considered subclonal. We further tested for evidence of allelic imbalance, by performing a statistical test to identify whether the fractional, allele-specific copy numbers for both alleles were different from one another in at least one sample (t-test, P-value threshold of 0.001). For each subclonal segment without evidence for allelic imbalance in any sample, we recalculated their total fractional copy number by summing the original fractional copy numbers for both alleles and assigning an integer copy number (closest integer to the original fractional copy number of respective allele) to allele B if the original fractional copy numbers for both alleles were larger than the closest integers for both alleles, or to allele A if they were less than the closest integer. The remaining allele (A or B, respectively) was assigned a fractional copy number calculated as the sum of its nearest integer and total fractional copy number minus the sum of closest integers for both original fractional copy numbers.<\/p>\n<p>Correcting erroneous homozygous deletions<\/p>\n<p>To prevent small clones being predicted as having copy number zero in both alleles for long stretches of the genome, the ALPACA method additionally implements constraints on the permitted number of homozygous deletions. Specifically, we introduce a threshold \\({h}=1\\) within each genomic segment \\(s\\in \\{1,\\,&#8230;,{m}\\}\\), such that a clone can have a homozygous deletion only if: the clone or its descendants are present in a tumour sample that has a fractional copy number of less than the threshold \\(h\\) in both alleles; and the genomic segment is shorter than 50\u2009MB.<\/p>\n<p>Obtaining a tumour phylogenetic tree<\/p>\n<p>To obtain a single tumour phylogenetic tree per patient case in the TRACERx study, we used the published phylogenetic trees from the recent primary<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e11907\" target=\"_blank\" rel=\"noopener\">5<\/a> and primary\u2013metastasis<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534&#x2013;542 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR6\" id=\"ref-link-section-d54970559e11911\" target=\"_blank\" rel=\"noopener\">6<\/a> TRACERx421 cohorts. These tumour phylogenies were constructed from DNA-sequencing data using CONIPHER<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR21\" id=\"ref-link-section-d54970559e11915\" target=\"_blank\" rel=\"noopener\">21<\/a>, as described previously<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e11919\" target=\"_blank\" rel=\"noopener\">5<\/a>. For tumour cases with multiple possible tumour phylogenetic trees, the tree with the lowest error was selected<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR21\" id=\"ref-link-section-d54970559e11923\" target=\"_blank\" rel=\"noopener\">21<\/a>.<\/p>\n<p>Obtaining tumour clone proportions<\/p>\n<p>For each tumour case, clone proportions of each tumour clone in each tumour sample were computed from the phylogenetic tree and mutation-cluster mean phylogenetic cancer cell fraction (PhyloCCF) values using the R function compute_subclone_proportions from the CONIPHER R package (v2.2.0)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Grigoriadis, K. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat. Protoc. 19, 159&#x2013;183 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR21\" id=\"ref-link-section-d54970559e11936\" target=\"_blank\" rel=\"noopener\">21<\/a>. The input parameter value force_clonal_100 was set to be TRUE, to force clones whose distribution of associated mutation PhyloCCF values was not statistically distinct from that of the truncal cluster PhyloCCF values in certain tumour samples to have mean PhyloCCF\u2009=\u20091 in these samples. This ensured that when computing the clone proportions the clonal diversity was reduced and prevented noisy cancer cell fraction estimates resulting in samples harbouring clones with very small clone proportion.<\/p>\n<p>Computing clone ploidy<\/p>\n<p>We used ALPACA to compute the clone ploidy for every clone from each tumour run through ALPACA in the TRACERx421 primary and primary\u2013metastasis cohorts<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e11949\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534&#x2013;542 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR6\" id=\"ref-link-section-d54970559e11952\" target=\"_blank\" rel=\"noopener\">6<\/a>. Clone ploidy was defined as follows: considering clone \\(i\\) with total copy number \\({c}_{s,i}={c}_{s,i}^{{\\rm{A}}}+{c}_{s,i}^{{\\rm{B}}}\\) in genomic segment \\(s\\), with segment size \\({w}_{s}\\), then the clone ploidy, \\({p}_{i}\\), was computed as the average clone copy number across all genomic segments, weighted by segment size:<\/p>\n<p>$${p}_{i}=\\frac{{\\sum }_{s=1}^{m}{w}_{s}{c}_{s,i}}{{\\sum }_{s=1}^{m}{w}_{s}}$$<\/p>\n<p>WGD detection using ALPACA<\/p>\n<p>To identify WGD events, we compared copy-number states across the genome between each clone and its parent and grandparent. If the mean ratio, weighted by segment length, between the child clone and the parent clone was above 1.5, we classified the child clone as genome-doubled. If this ratio was below 1.5, but the ratio between the child and the grandparent was above 1.5 and the parent clone was not already classified as genome-doubled, we also classified the child clone as genome-doubled.<\/p>\n<p>Defining copy-number events on edges of the phylogenetic treeDefining segment-specific copy-number events<\/p>\n<p>For each tumour case and each genomic segment, we inferred the allele-specific copy number change on each edge of the phylogenetic tree on the basis of ALPACA\u2019s output. Next, we applied the following criteria to classify whether a copy-number event had occurred on each edge. Considering an edge of the phylogenetic tree \\(({v}_{i},{v}_{j})\\in E(T)\\), and the corresponding allele-specific clone copy number of the parent clone \\({{c}^{{\\rm{A}}}}_{i}\\), \\({{c}^{{\\rm{B}}}}_{i}\\) and the child clone \\({{c}^{{\\rm{A}}}}_{j}\\), \\({{c}^{{\\rm{B}}}}_{j}\\), we define the following criteria for copy-number events, in which \\({p}_{i}\\) represents the clone ploidy of clone \\(i\\):<\/p>\n<p>Event name: criteria:<\/p>\n<p>Copy-number gain: \\({{c}^{{\\rm{A}}}}_{j} &gt; \\,{{c}^{{\\rm{A}}}}_{i}\\) or \\({{c}^{{\\rm{B}}}}_{j} &gt; {{c}^{{\\rm{B}}}}_{i}\\)<\/p>\n<p>Copy-number loss: \\({{c}^{{\\rm{A}}}}_{j}  or \\({{c}^{{\\rm{B}}}}_{j} <\/p>\n<p>LOH: \\({{c}^{{\\rm{A}}}}_{i}\\,\\ne \\,0\\,\\&amp; \\,{{c}^{{\\rm{A}}}}_{j}\\,=\\,0\\) or \\({{c}^{{\\rm{B}}}}_{i}\\,\\ne \\,0\\,\\&amp; \\,{{c}^{{\\rm{B}}}}_{j}\\,=\\,0\\,\\)<\/p>\n<p>Amplification: \\({{c}^{{\\rm{A}}}}_{j} &gt; 2\\times \\,{{c}^{{\\rm{A}}}}_{i}\\,\\&amp; \\,{{c}^{{\\rm{A}}}}_{j}\\ge 4\\,\\&amp; \\,{{c}^{{\\rm{A}}}}_{j} &gt; {p}_{j}\\)<\/p>\n<p>or \\({{c}^{{\\rm{B}}}}_{j} &gt; 2\\times \\,{{c}^{{\\rm{B}}}}_{i}\\,\\,{\\rm{\\&amp; }}\\,{{c}^{{\\rm{B}}}}_{j}\\ge 4\\,\\&amp; \\,{{c}^{{\\rm{B}}}}_{j} &gt; {p}_{j}\\)<\/p>\n<p>Defining chromosome-arm-level copy-number events<\/p>\n<p>For each edge of the phylogenetic tree, we considered a whole chromosome arm to be affected by a copy-number event on the basis of the following classification, based on previous sample-level chromosome-arm copy-number calling analysis<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109&#x2013;2121 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR10\" id=\"ref-link-section-d54970559e13442\" target=\"_blank\" rel=\"noopener\">10<\/a>:<\/p>\n<p>Chromosome-arm event name: criteria<\/p>\n<p>Copy-number gain: 90% of the chromosome arm is affected by gain, for either allele<\/p>\n<p>Copy-number loss: 90% of the chromosome arm is affected by loss, for either allele<\/p>\n<p>LOH: \\(\\ge 90 \\% \\) of the chromosome arm is at copy number 0 in the child clone, but \\(\\le 20 \\% \\) of the chromosome arm is at copy number 0 in the parent clone, for either allele<\/p>\n<p>Amplification: \\(\\ge 90 \\% \\) of the chromosome arm is amplified as per the criteria described above, for either allele.<\/p>\n<p>Defining chromosome-arm-level copy-number events detectable with sample-level analysis<\/p>\n<p>For each sample, we called an arm-level LOH event if at least 0.98 of the analysed genome had a fractional copy-number state below 0.5 for either allele.<\/p>\n<p>Event-ordering analysisIdentifying and assigning top frequently altered driver events to edges of the phylogenetic tree<\/p>\n<p>To select driver SNVs and relevant copy-number events to include in the Bradley\u2013Terry model for ordering driver events, we first classified the genomic driver events for each tumour case in the TRACERx421 primary and primary\u2013metastasis cohorts<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13555\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534&#x2013;542 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR6\" id=\"ref-link-section-d54970559e13558\" target=\"_blank\" rel=\"noopener\">6<\/a>. These driver events included chromosome-arm amplifications and LOH events, focal (gene-level) amplifications and SNV driver events. We considered amplifications affecting oncogenes on the basis of a list collated from known copy-number drivers and known pan-cancer drivers as described in a previous study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13562\" target=\"_blank\" rel=\"noopener\">5<\/a>. Oncogenes were considered to be amplified if the gene locus was completely contained within a copy-number segment called as amplified on an edge of the phylogenetic tree (using criteria described above). If a focal amplification was overlapping with a chromosome-arm amplification on an edge of the phylogenetic tree, only the chromosome-arm-level event was considered to have affected that edge of the tree. Then, including both clonal and subclonal events, we quantified the frequency of the number of tumours in which any such event was called. We selected the following events that were identified most frequently across the cohort to include in the ordering analysis, for LUAD and LUSC separately:<\/p>\n<p>Event ID: number of top most frequent events selected for ordering analysis:<\/p>\n<p>Chromosome-arm amplifications: 10<\/p>\n<p>Chromosome-arm LOH events: 10<\/p>\n<p>Subclonal chromosome-arm amplifications: 5<\/p>\n<p>Subclonal chromosome-arm LOH events: 5<\/p>\n<p>Focal amplifications (genes not part of an arm event): 10<\/p>\n<p>Subclonal focal amplifications (genes not part of an arm event): 5<\/p>\n<p>All SNVs classified as driver mutations as described in a previous study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Wintersinger, J. A. et al. Reconstructing complex cancer evolutionary histories from multiple bulk DNA samples using pairtree. Blood Cancer Discov. 3, 208&#x2013;219 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR26\" id=\"ref-link-section-d54970559e13591\" target=\"_blank\" rel=\"noopener\">26<\/a>: 10<\/p>\n<p>Subclonal SNVs classified as driver mutations as described in a previous study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Wintersinger, J. A. et al. Reconstructing complex cancer evolutionary histories from multiple bulk DNA samples using pairtree. Blood Cancer Discov. 3, 208&#x2013;219 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR26\" id=\"ref-link-section-d54970559e13598\" target=\"_blank\" rel=\"noopener\">26<\/a>: 5<\/p>\n<p>From this list of the top most frequent alterations, in LUAD and LUSC separately, we further subset to the events that were called as present in at least five tumours.<\/p>\n<p>Specifying the event-ordering model<\/p>\n<p>On the basis of these most frequent alterations, for LUAD and LUSC separately, we counted the number of tumours harbouring an alteration X from the top frequent alteration list occurring on an ancestral node, followed by any descendant node on the tree harbouring alteration Y from the top frequent alteration list (as depicted in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig4\" target=\"_blank\" rel=\"noopener\">4a<\/a>). If multiple descendant clones harboured the same descendant alteration Y to ancestral alteration X, we counted the ancestor\u2013descendant relationship only once per tumour. These pairwise counts were entered into a Bradley\u2013Terry model including a bias-reduced estimate, which was implemented using the BTm and update functions from the R package BradleyTerry2 (v1.1-2)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Turner, H. &amp; Firth, D. Bradley-Terry models in R: the BradleyTerry2 package. J. Stat. Softw. 48, 1&#x2013;21 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR51\" id=\"ref-link-section-d54970559e13617\" target=\"_blank\" rel=\"noopener\">51<\/a>. When running the ordering analysis for the TRACERx421 primary\u2013metastasis cohort<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534&#x2013;542 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR6\" id=\"ref-link-section-d54970559e13621\" target=\"_blank\" rel=\"noopener\">6<\/a>, we additionally considered metastatic \u2018seeding\u2019 as an event that could affect an edge of the phylogenetic tree, and therefore the metastatic seeding was ordered in the same way as other top most frequent events. Clones were labelled as being a seeding clone on the basis of the criteria described previously<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534&#x2013;542 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR6\" id=\"ref-link-section-d54970559e13625\" target=\"_blank\" rel=\"noopener\">6<\/a>.<\/p>\n<p>Expanded subclones<\/p>\n<p>Expanded subclones were defined as non-MRCA tumour clones whose associated set of mutations seem clonal in at least one tumour sample<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13638\" target=\"_blank\" rel=\"noopener\">5<\/a>.<\/p>\n<p>Clone classification<\/p>\n<p>We classified clones into classes as follows (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig5\" target=\"_blank\" rel=\"noopener\">5a<\/a>):<\/p>\n<ol class=\"u-list-style-none\">\n<li>\n                  1.<\/p>\n<p>MRCA clones;<\/p>\n<\/li>\n<li>\n                  2.<\/p>\n<p>shared clones (clones with mutations shared between the primary and metastatic samples);<\/p>\n<\/li>\n<li>\n                  3.<\/p>\n<p>seeding clones (the most recent shared clones);<\/p>\n<\/li>\n<li>\n                  4.<\/p>\n<p>primary specific (clones with mutations detected only in primary samples); and<\/p>\n<\/li>\n<li>\n                  5.<\/p>\n<p>metastasis specific (clones with mutations detected only in metastatic samples).<\/p>\n<\/li>\n<\/ol>\n<p>Computing number of SCNAs on a tree edge<\/p>\n<p>ALPACA infers copy-number events for each segment independently; however, it is known that SCNAs can overlap multiple genomic segments<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biol. 23, 241 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR4\" id=\"ref-link-section-d54970559e13719\" target=\"_blank\" rel=\"noopener\">4<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Zaccaria, S., El-Kebir, M., Klau, G. W. &amp; Raphael, B. J. Phylogenetic copy-number factorization of multiple tumor samples. J. Comput. Biol. 25, 689&#x2013;708 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR31\" id=\"ref-link-section-d54970559e13722\" target=\"_blank\" rel=\"noopener\">31<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"El-Kebir, M. et al. Complexity and algorithms for copy-number evolution problems. Algorithms Mol. Biol. 12, 13 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR52\" id=\"ref-link-section-d54970559e13725\" target=\"_blank\" rel=\"noopener\">52<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Zeira, R., Zehavi, M. &amp; Shamir, R. A linear-time algorithm for the copy number transformation problem. J. Comput. Biol. 24, 1179&#x2013;1194 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR53\" id=\"ref-link-section-d54970559e13728\" target=\"_blank\" rel=\"noopener\">53<\/a>. Using the ALPACA output, we hence aimed to compute a summary statistic, called the number of SCNAs on a tree edge, \\(N\\), to enumerate the actual number of copy-number events that occurred in the copy-number evolutionary tree inferred by ALPACA while taking into account neighbouring SCNAs. In brief, we computed \\(N\\) as follows. Considering each allele and each chromosome separately, we counted the number of contiguous copy-number segments with a copy-number gain event of the same amplitude, and the number of contiguous segments with a copy-number loss event of the same amplitude, between the parent and the child node. The total number of copy-number events on this tree edge was then computed as the total sum of gains and losses on the tree edge, across all chromosomes, across both alleles. We note that in contrast to previous studies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Kaufmann, T. L. et al. MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution. Genome Biol. 23, 241 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR4\" id=\"ref-link-section-d54970559e13766\" target=\"_blank\" rel=\"noopener\">4<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"El-Kebir, M. et al. Complexity and algorithms for copy-number evolution problems. Algorithms Mol. Biol. 12, 13 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR52\" id=\"ref-link-section-d54970559e13769\" target=\"_blank\" rel=\"noopener\">52<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Zeira, R., Zehavi, M. &amp; Shamir, R. A linear-time algorithm for the copy number transformation problem. J. Comput. Biol. 24, 1179&#x2013;1194 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR53\" id=\"ref-link-section-d54970559e13772\" target=\"_blank\" rel=\"noopener\">53<\/a>, this method does not model the most parsimonious number of interval events required to transform the copy-number profile of the parent clone to that of the child clone. A mathematical description of the specific method used to compute the number of gains, losses and overall number of SCNAs on each edge is available in Supplementary Information section\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#MOESM1\" target=\"_blank\" rel=\"noopener\">4<\/a>.<\/p>\n<p>Comparing rate of SNV and SCNA acquisition per tree edge<\/p>\n<p>To compare the rate of SNVs and SCNAs acquisition on each tree edge, we considered the number of SNVs on each edge divided by the total sum of subclonal SNVs (that is, all mutations on subclonal edges of the tumour phylogeny) in the tumour across all tree edges, and compared this with the number of copy-number events (that is, number of SCNAs) on the same edge, divided by the total sum of subclonal copy-number events across all edges of the tumour. This analysis was run considering only subclones in both the TRACERx421 primary and primary\u2013metastasis cohorts. Further, using the set of all subclones in the TRACERx421 primary\u2013metastasis cohort, we fitted a linear mixed-effects model with the dependent variable being the normalized number of SCNAs, tumour as a random-effect explanatory variable, and the fixed-effect explanatory variables: normalized number of SNVs, histology and clone type (shared clones (reference level), seeding clones, primary clones or metastatic clones). Then, using the same set of all subclones in the TRACERx421 primary\u2013metastasis cohort, we fitted a mixed-effects logistic regression model with the dependent binary variable of whether a clone was metastatic or not, tumour as a random-effect explanatory variable, and the fixed-effect explanatory variables number of SNVs, number of gains on the edge, number of losses on the edge and histology.<\/p>\n<p>Comparing SCNAs in seeding versus non-seeding clones<\/p>\n<p>To identify specific copy-number events occurring on evolutionary trajectories leading to metastatic seeding, we considered only a subset of tumours from the TRACERx421 primary\u2013metastasis cohort that satisfied the following conditions: the tumours had exclusively subclonal seeding; and the tumours had at least one non-seeding trajectory, which is defined as the tree path connecting the MRCA clone to a non-seeding primary clone: a primary clone that has no ancestors that are seeding (n\u2009=\u200971; Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#Fig14\" target=\"_blank\" rel=\"noopener\">8a<\/a>). This set of tumours thus had both of the following tree paths present: paths connecting the MRCA clone to a subclonal seeding clone, and paths connecting the MRCA clone to a subclonal non-seeding primary clone.<\/p>\n<p>For each oncogene in our collated list<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13804\" target=\"_blank\" rel=\"noopener\">5<\/a>, for each tumour, we performed a binary classification of whether a copy-number increase has occurred on a trajectory between the (non-seeding) MRCA and any seeding subclone, and a binary classification of whether a copy-number increase has occurred on a trajectory between the non-seeding MRCA and any non-seeding subclone (a copy-number decrease is considered for TSGs, respectively, and the specific copy-number events we considered are listed below). Then, for each oncogene event, we counted the number of tumours for which an increase affected subclonal seeding trajectories and the number of tumours for which an increase affected subclonal non-seeding trajectories. There are a different number of seeding and non-seeding clones in each tumour, and indeed a larger number of non-seeding clones overall, for each oncogene event. Hence, we tested whether the counts of tumours with events in seeding and non-seeding trajectories differed from a background count of the total number of such events affecting seeding and non-seeding trajectories summed across all non-driver genes.<\/p>\n<p>We categorized \u2018increase\u2019 copy-number events at oncogene loci into: gains, gains &gt;1 copy, amplifications. For TSG loci, we categorized \u2018decrease\u2019 copy-number events into: losses, losses &gt;1 copy, LOH and LOH 0|1 (which describes the clone copy-number state of one allele with copy number 0, and the remaining allele with 1 copy).<\/p>\n<p>Calculating CCD<\/p>\n<p>To calculate CCD for clones within the primary cohort, we first removed all of the clones exclusively found in the lymph nodes during the primary surgery, as these clones represent cells that have already migrated from the primary site. For the remaining clones, we concatenated the vectors of predicted clone copy-number states for alleles A and B along the genome for each clone, and computed the pairwise Euclidean distance of this vector between each pair of clones. Then, for each tumour case in the TRACERx421 primary cohort<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13819\" target=\"_blank\" rel=\"noopener\">5<\/a>, the maximum value distance between any two clones was chosen as the CCD score for that tumour.<\/p>\n<p>Clinical data and survival analysis<\/p>\n<p>The clinical data were preprocessed in the same way as described in previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13831\" target=\"_blank\" rel=\"noopener\">5<\/a>. Multivariable Cox model and Kaplan\u2013Meier analyses were performed as described in previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13835\" target=\"_blank\" rel=\"noopener\">5<\/a>. Covariates \u2014 age, stage, pack-years, histology, adjuvant and fraction of the aberrant genome with subclonal SCNAs computed on a sample level (termed \u2018SCNA-ITH\u2019 in ref.\u2009<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13839\" target=\"_blank\" rel=\"noopener\">5<\/a>) \u2014 were processed as described in ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525&#x2013;533 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#ref-CR5\" id=\"ref-link-section-d54970559e13843\" target=\"_blank\" rel=\"noopener\">5<\/a> and were included as covariates in the Cox proportional hazards model.<\/p>\n<p>Statistical information<\/p>\n<p>Statistical tests were performed using either R or Python (scipy v1.10.1).<\/p>\n<p>All tests were two-sided, unless stated otherwise, and were carried out using paired or unpaired options as appropriate, unless stated otherwise. Hazard ratios and P values for the survival analysis were computed using the R Survival packages. For all plots involving statistical tests, all data points included are plotted or the number of data points included is specified in the figure legend.<\/p>\n<p>When data is represented using box plots, the box represents the interquartile range with the median line. Whiskers denote the lowest and highest values within 1.5 times the interquartile range from the first and third quartiles, respectively.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-025-09398-w#MOESM2\" target=\"_blank\" rel=\"noopener\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"The ALPACA algorithm We introduce ALPACA, an algorithm to infer allele-specific copy-number profiles of every tumour clone, given&hellip;\n","protected":false},"author":2,"featured_media":342268,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3846],"tags":[82907,267,3965,3966,121752,97464,70,16,15],"class_list":{"0":"post-342267","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-computational-models","9":"tag-genetics","10":"tag-humanities-and-social-sciences","11":"tag-multidisciplinary","12":"tag-non-small-cell-lung-cancer","13":"tag-phylogeny","14":"tag-science","15":"tag-uk","16":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115023808200814864","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/342267","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=342267"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/342267\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/342268"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=342267"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=342267"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=342267"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}