Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Linder, J., Srivastava, D. & Yuan, H. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 57, 949–961 (2025).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet. 54, 725–734 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).

Article 
ADS 
CAS 
PubMed 

Google Scholar
 

Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Mansour, M. R. et al. Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Cochran, K. et al. Dissecting the cis-regulatory syntax of transcription initiation with deep learning. Preprint at bioRxiv https://doi.org/10.1101/2024.05.28.596138 (2024).

Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).

Article 
CAS 
PubMed 

Google Scholar
 

Zeng, T. & Li, Y. I. Predicting RNA splicing from DNA sequence using Pangolin. Genome Biol. 23, 103 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Gschwind, A. R. et al. An encyclopedia of enhancer-gene regulatory interactions in the human genome. Preprint at bioRxiv https://doi.org/10.1101/2023.11.09.563812 (2023).

Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 234–241 (Springer, 2015).

Zhou, J., Rizzo, K., Tang, Z. & Koo, P. K. Uncertainty-aware genomic deep learning with knowledge distillation. Preprint at bioRxiv https://doi.org/10.1101/2024.11.13.623485 (2024).

López-Bigas, N., Audit, B., Ouzounis, C., Parra, G. & Guigó, R. Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett. 579, 1900–1903 (2005).

Article 
PubMed 

Google Scholar
 

Xu, C. et al. Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences. Genome Res. 34, 1052–1065 (2024).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

Article 

Google Scholar
 

Wang, Z. & Burge, C. B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14, 802–813 (2008).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Fairbrother, W. G., Yeh, R.-F., Sharp, P. A. & Burge, C. B. Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 (2002).

Article 
ADS 
CAS 
PubMed 

Google Scholar
 

Alasoo, K. et al. Genetic effects on promoter usage are highly context-specific and contribute to complex traits. eLife 8, e41673 (2019).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Chong, R. et al. A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions. Mol. Cell 73, 183–194 (2019).

Article 
PubMed 

Google Scholar
 

Albert, F. W. & Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015).

Article 
CAS 
PubMed 

Google Scholar
 

Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).

Article 
MathSciNet 

Google Scholar
 

Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Tian, B. & Manley, J. L. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 18, 18–30 (2017).

Article 
CAS 
PubMed 

Google Scholar
 

Pampari, A. et al. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. Preprint at bioRxiv https://doi.org/10.1101/2024.12.25.630221 (2025).

Tehranchi, A. K. et al. Pooled ChIP-seq links variation in transcription factor binding to complex disease risk. Cell 165, 730–741 (2016).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019).

Article 
ADS 
PubMed 
PubMed Central 

Google Scholar
 

Shigaki, D. et al. Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. Hum. Mutat. 40, 1280–1291 (2019).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Liu, Y. et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nat. Genet. 49, 1211–1218 (2017).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Liu, Y. et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat. Genet. 52, 811–818 (2020).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Smith, C. et al. TAL1 activation in T-cell acute lymphoblastic leukemia: a novel oncogenic 3′ neo-enhancer. Haematologica 108, 1259–1271 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Celaj, A. et al. An RNA foundation model enables discovery of disease mechanisms and candidate therapeutics. Preprint at bioRxiv https://doi.org/10.1101/2023.09.20.558508 (2023).

de Almeida, B. P. et al. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 626, 207–211 (2023).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Taskiran, I. I. et al. Cell-type-directed design of synthetic enhancers. Nature 626, 212–220 (2023).

Article 
ADS 
PubMed 
PubMed Central 

Google Scholar
 

Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. Preprint at bioRxiv https://doi.org/10.1101/2025.02.18.638918 (2025).

Sasse, A. et al. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat. Genet. 55, 2060–2064 (2023).

Article 
CAS 
PubMed 

Google Scholar
 

Huang, C. et al. Personal transcriptome variation is poorly explained by current genomic deep learning models. Nat. Genet. 55, 2056–2059 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Finucane, H. K. et al. Variant scoring performance across selection regimes depends on variant-to-gene and gene-to-disease components. Preprint at bioRxiv https://doi.org/10.1101/2024.09.17.613327 (2024).

Hingerl, J. C. et al. Scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution. Nat. Methods 22, 2275–2285 (2025).

Lal, A. et al. Decoding sequence determinants of gene expression in diverse cellular and disease states. Preprint at bioRxiv https://doi.org/10.1101/2024.10.09.617507 (2025).

Dalla-Torre, H. et al. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).

Article 
CAS 
PubMed 

Google Scholar
 

Karollus, A. et al. Species-aware DNA language models capture regulatory elements and their evolution. Genome Biol. 25, 83 (2024).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Benegas, G., Batra, S. S. & Song, Y. S. DNA language models are powerful predictors of genome-wide variant effects. Proc. Natl Acad. Sci. USA 120, e2311219120 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Hu, Y. et al. Multiscale footprints reveal the organization of cis-regulatory elements. Nature 638, 779–786 (2025).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Ling, J. P. et al. ASCOT identifies key regulators of neuronal subtype-specific splicing. Nat. Commun. 11, 137 (2020).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Wagner, N. et al. Aberrant splicing prediction across human tissues. Nat. Genet. 55, 861–870 (2023).

Article 
CAS 
PubMed 

Google Scholar
 

Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).

Article 
CAS 
PubMed 

Google Scholar
 

Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Rauluseviciute, I. et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 52, D174–D182 (2024).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Newburger, D. E. & Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein–DNA interactions. Nucleic Acids Res. 37, D77–D82 (2009).

Article 
CAS 
PubMed 

Google Scholar
 

Benegas, G., Eraslan, G. & Song, Y. S. Benchmarking DNA sequence models for causal regulatory variant prediction in human genetics. Preprint at bioRxiv https://doi.org/10.1101/2025.02.11.637758 (2025).

Mohammadi, P., Castel, S. E., Brown, A. A. & Lappalainen, T. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res. 27, 1872–1884 (2017).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Matsunaga, A. et al. Compound heterozygosity for an apolipoprotein A1 gene promoter mutation and a structural nonsense mutation with apolipoprotein A1 deficiency. Arterioscler. Thromb. Vasc. Biol. 19, 348–355 (1999).

Article 
CAS 
PubMed 

Google Scholar
 

Yüregir, G. T. et al. Hb H disease in a Turkish family resulting from the interaction of a deletional α-thalassaemia-1 and a newly discovered poly A mutation. Br. J. Haematol. 80, 527–532 (1992).

Article 
PubMed 

Google Scholar
 

Heibel, S. K. et al. N-carbamylglutamate enhancement of ureagenesis leads to discovery of a novel deleterious mutation in a newly defined enhancer of the NAGS gene and to effective therapy. Hum. Mutat. 32, 1153–1160 (2011).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Kaneko, K. et al. Identification of a novel erythroid-specific enhancer for the ALAS2 gene and its loss-of-function mutation which is associated with congenital sideroblastic anemia. Haematologica 99, 252–261 (2014).

Article 
CAS 
PubMed 

Google Scholar
 

Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).

Article 
ADS 
CAS 
PubMed 

Google Scholar
 

Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

Article 
CAS 
PubMed 

Google Scholar
 

Collins, F. S. et al. Concordance of a point mutation 5′ to the Gγ globin gene with Gγβ+ hereditary persistence of fetal hemoglobin in the black population. Blood 64, 1292–1296 (1984).

Article 
CAS 
PubMed 

Google Scholar
 

Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).

Article 
CAS 
PubMed 

Google Scholar
 

Chen, J. et al. WNT7B promotes bone formation in part through mTORC1. PLoS Genet. 10, e1004145 (2014).

Article 
PubMed 
PubMed Central 

Google Scholar