{"id":474140,"date":"2025-10-04T18:37:17","date_gmt":"2025-10-04T18:37:17","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/474140\/"},"modified":"2025-10-04T18:37:17","modified_gmt":"2025-10-04T18:37:17","slug":"a-genetic-map-of-human-metabolism-across-the-allele-frequency-spectrum","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/474140\/","title":{"rendered":"A genetic map of human metabolism across the allele frequency spectrum"},"content":{"rendered":"<p>Study design<\/p>\n<p>The UKB is a prospective cohort study from the UK that contains more than 500,000 volunteers between 40 and 69\u2009years of age at inclusion. The study design, sample characteristics and genotype data have been described elsewhere<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 66\" title=\"Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203&#x2013;209 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR66\" id=\"ref-link-section-d188851467e1951\" target=\"_blank\" rel=\"noopener\">66<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 67\" title=\"Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR67\" id=\"ref-link-section-d188851467e1954\" target=\"_blank\" rel=\"noopener\">67<\/a>. The UKB was approved by the National Research Ethics Service Committee North West Multi-Centre Haydock and all study procedures were performed in accordance with the World Medical Association Declaration of Helsinki ethical principles for medical research. We included 460,036 individuals across the three major ancestries in UKB in our analyses for whom inclusion criteria (given consent to further usage of the data, availability of genetic data and passed quality control (QC) of genetic data) applied. Data from UKB were linked to death registries and hospital episode statistics (HES). We used the ancestry assignments as defined by the pan-UKB<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 68\" title=\"Karczewski, K. J. et al. Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects. Nat. Genet. &#010;                https:\/\/doi.org\/10.1038\/s41588-025-02335-7&#010;                &#010;               (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR68\" id=\"ref-link-section-d188851467e1958\" target=\"_blank\" rel=\"noopener\">68<\/a> and further assigned unclassified individuals to their respective ancestries based on a k-nearest neighbor approach using genetic principal components. All analyses were conducted under UKB applications 44448 and 30418.<\/p>\n<p>Metabolomic measurements<\/p>\n<p>Up to 249 targeted metabolomic measurements were quantified using the Nightingale NMR platform in human EDTA plasma samples. Detailed experimental procedures for the NMR platform are described elsewhere<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 65\" title=\"Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat. Commun. 14, 604 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR65\" id=\"ref-link-section-d188851467e1973\" target=\"_blank\" rel=\"noopener\">65<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 69\" title=\"W&#xFC;rtz, P. et al. Quantitative serum nuclear magnetic resonance metabolomics in large-scale epidemiology: a primer on -omic technologies. Am. J. Epidemiol. 186, 1084&#x2013;1096 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR69\" id=\"ref-link-section-d188851467e1976\" target=\"_blank\" rel=\"noopener\">69<\/a>. The NMR platform covers a wide range of metabolic biomarkers, including lipoprotein lipids, fatty acids and small molecules such as amino acids, ketone bodies and glycolysis metabolites, quantified in molar concentration units. We combine here three data releases that cover the full breadth of the UKB. Metabolomics data were available for 482,276 individuals, including 19,699 samples with data from baseline and repeat visit.<\/p>\n<p>Metabolites were reliably detected, with only one biomarker over 2.5% missingness in releases 1\/2 (creatinine) and release 3 (3-hydroxybutyrate). Ninety-eight percent of the samples had <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 70\" title=\"Ritchie, S. C. et al. Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants. Sci. Data 10, 64 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR70\" id=\"ref-link-section-d188851467e1983\" target=\"_blank\" rel=\"noopener\">70<\/a> R package (v2.2, R v4.3.2) for QC and removal of technical variation in the NMR data. This includes technical confounders such as sample preparation time, shipping plate well, spectrometer effects, time drift within spectrometers and outlier plates.<\/p>\n<p>We removed samples that were flagged by Nightingale for poor quality and used the MICE (Multivariate Imputation by Chained Equations)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 71\" title=\"Buuren, S. V. &amp; Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1&#x2013;67 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR71\" id=\"ref-link-section-d188851467e1990\" target=\"_blank\" rel=\"noopener\">71<\/a> R package to impute the remaining dataset. In total, we imputed 0.16% and 0.17% of data in releases 1\/2 and release 3, respectively.<\/p>\n<p>We observed overall good consistency with the overlapping routine blood biomarkers previously measured in the same cohort (median r\u2009=\u20090.9, range 0.62\u20130.94) (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#Fig14\" target=\"_blank\" rel=\"noopener\">9<\/a>).<\/p>\n<p>Adjustment of metabolomic data for medication use<\/p>\n<p>We sought to adjust the NMR data for medication use, especially cholesterol-lowering medication, to avoid false-positive results driven by medication use in downstream genetic analyses. For male and female participants separately, we fit linear models to quantify the impact of six drug categories on each NMR phenotype: cholesterol-lowering medicine, blood pressure medication, diabetic medication including Metformin usage, oral contraceptive pill or minipill (female only) and hormone replacement therapy (female only) (UKB fields 6177 and 6153) (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#MOESM1\" target=\"_blank\" rel=\"noopener\">6<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#MOESM4\" target=\"_blank\" rel=\"noopener\">18<\/a>).<\/p>\n<p>We used data from individuals with both baseline (NMRbaseline) and repeat (NMRfollow-up) assessment metabolic data available and estimated the effect of medication (med terms) in individuals that did not take any drugs at the time of the baseline visit (n\u2009=\u20096,312 male, n\u2009=\u20096,713 female participants) using the following model:<\/p>\n<p>$$\\begin{array}{l}{\\mathrm{NMR}}_{\\mathrm{baseline}} \\sim {\\mathrm{NMR}}_{{{\\mathrm{follow}}}{{\\text{-}}}{{\\mathrm{up}}}}+\\mathrm{age}+\\mathrm{BMI} \\\\+{\\mathrm{med}}_{\\mathrm{cholesterol}}+{\\mathrm{med}}_{\\mathrm{diabetic}}+{\\mathrm{med}}_{\\mathrm{contraception}}+{\\mathrm{med}}_{\\mathrm{hormone}}+{\\mathrm{error}}.\\end{array}$$<\/p>\n<p>We note that the sample sizes for diabetic medication (nmale\u2009=\u200945, nfemale\u2009=\u200929), oral contraceptive medication (n\u2009=\u200927) and hormone replacement therapy (n\u2009=\u2009148) were too small to reliably estimate any effects. Effect estimates for diabetic medication were correlated to estimates for cholesterol-lowering medicine. The effect estimates for blood pressure medication were minimal across the phenotypes. We considered thus only the impact of cholesterol-lowering medicine and corrected the metabolic data in a sex-specific manner.<\/p>\n<p>Genotyping and GWAS analyses<\/p>\n<p>GWAS was performed on 249 metabolic traits measured by the NMR platform on British European (n\u2009=\u2009434,646), British Central\/South Asian (n\u2009=\u20098,796) and British African participants (n\u2009=\u20096,573) that had complete phenotypic, covariate and genetic information available. We used the Haplotype Reference Consortium-imputed genetic data, including all autosomal chromosomes and the X chromosome. We performed GWAS under the additive model using REGENIE (v3.2.5)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 72\" title=\"Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097&#x2013;1103 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR72\" id=\"ref-link-section-d188851467e2220\" target=\"_blank\" rel=\"noopener\">72<\/a> that uses a two-step procedure to account for population structure. We derived a set of high-quality genotyped variants per population by applying the following filters: (MAF &gt;1%, minor allele count (MAC) &gt;100, missingness rate PHWE\u2009&gt;\u20091\u2009\u00d7\u200910\u221215). Furthermore, linkage disequilibrium pruning was performed using a 1,000-kb window, shifting by 100 variants and removing variants with LD (r2) &gt;0.8. We used these variants as input for the first step of REGENIE to generate individual trait predictions using the leave-one-chromosome-out scheme. These predictions are used in the second step where individual variants are tested. Models were adjusted for age, sex and the first ten genetic principal components. We tested variants with a MAF &gt;0.5%, amounting to 11.5 million variants in British European individuals, 11.5 million variants in British Central\/South Asian individuals and 19.3 million variants in British African individuals.<\/p>\n<p>For initial discovery, we performed a meta-analysis across the three ancestral groups using METAL<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 73\" title=\"Willer, C. J., Li, Y. &amp; Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190&#x2013;2191 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR73\" id=\"ref-link-section-d188851467e2238\" target=\"_blank\" rel=\"noopener\">73<\/a>. We required variants to be present in at least two ancestral groups. To declare significance, we considered a stringent P-value threshold (2.0\u2009\u00d7\u200910\u221210) by dividing the standard genome-wide threshold by the number of metabolic phenotypes (5.0\u2009\u00d7\u200910\u22128\/249).<\/p>\n<p>We tested our results for genomic inflation and calculated the single-nucleotide polymorphism (SNP)-based heritability using LD-score regression<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 74\" title=\"Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291&#x2013;295 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR74\" id=\"ref-link-section-d188851467e2252\" target=\"_blank\" rel=\"noopener\">74<\/a> (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#MOESM4\" target=\"_blank\" rel=\"noopener\">19<\/a>).<\/p>\n<p>Regional clumping and fine-mapping<\/p>\n<p>We used regional clumping (\u00b1500\u2009kb) around sentinel variants from the analyses including British European samples to select independent genomic regions associated with a metabolic phenotype and collapsed neighboring regions using BEDtools (v2.30.0). We treated the extended MHC region (chr6: 25.5\u201334.0\u2009Mb) as one region.<\/p>\n<p>Within each region of interest, excluding the MHC region, we performed statistical fine-mapping for all phenotypes associated with that region using the \u2018Sum of single effects\u2019 model (SuSiE) implemented in the susieR (v0.12.35) R package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 75\" title=\"Wang, G., Sarkar, A., Carbonetto, P. &amp; Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B 82, 1273&#x2013;1300 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR75\" id=\"ref-link-section-d188851467e2271\" target=\"_blank\" rel=\"noopener\">75<\/a>. In brief, SuSiE uses a Bayesian framework for variable selection in a multiple regression problem with the aim to identify sets of independent variants each of which probably contains the true causally underlying genetic variant. We implemented the workflow using default prior and parameter settings, apart from the minimum absolute correlation, which we set to 0.1. Because SuSiE is implemented in a linear regression framework, we used the GWAS summary statistics with a matching correlation matrix of dosage genotypes instead of individual-level data to implement fine-mapping (susie_rss()) as recommended by the authors<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 75\" title=\"Wang, G., Sarkar, A., Carbonetto, P. &amp; Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B 82, 1273&#x2013;1300 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR75\" id=\"ref-link-section-d188851467e2275\" target=\"_blank\" rel=\"noopener\">75<\/a>.<\/p>\n<p>To determine the appropriate number of credible sets within each region, we iterated over the maximum credible sets parameter in susieR from two to ten, thus generating fine-mapped results constrained to a range of maximum number of credible sets. For each collection of credible sets, we pruned sets where the lead variant was correlated to the lead variant of other credible sets (r2\u2009&gt;\u20090.25). After pruning, we considered the fine-mapped results with the largest number of credible sets.<\/p>\n<p>We performed several sensitivity analyses by computing joint models per locus\u2013phenotype combination, jointly modeling the effect of all distinct lead credible set variants in a single linear model. Subsequently, we retained only credible sets where the lead variant reached genome-wide significance (P\u2009=\u20095.0\u2009\u00d7\u200910\u22128) in both marginal and joint statistics. Furthermore, we ensured the estimated coefficients were directionally concordant and of similar magnitude between joint and marginal models (\u00b125%). Linear models were implemented in R using the glm() function and used only unrelated British European participants and the same set of covariates as described above.<\/p>\n<p>Finally, we used LD clumping (r2\u2009&gt;\u20090.6) to identify credible sets shared across metabolic phenotypes.<\/p>\n<p>We computed the correlation matrix with LDscore v2.0 using genetic data from 50,000 randomly selected, unrelated White European UKB participants. In situations where SuSiE did not deliver a credible set, we used the Wakefield approximation<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 76\" title=\"Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79&#x2013;86 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR76\" id=\"ref-link-section-d188851467e2305\" target=\"_blank\" rel=\"noopener\">76<\/a> to compute 95%-credible sets.<\/p>\n<p>Replication of genetic associations<\/p>\n<p>We replicated our trans-ancestral genetic signals using two independent studies: (1) the so-far largest published mGWAS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Karjalainen, M. K. et al. Genome-wide characterization of circulating metabolic biomarkers. Nature 628, 130&#x2013;138 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR3\" id=\"ref-link-section-d188851467e2317\" target=\"_blank\" rel=\"noopener\">3<\/a> and (2) a parallel effort using overlapping UKB data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Tambets, R. et al. Genome-wide association study for circulating metabolites in 619,372 individuals. Preprint at medRxiv &#010;                https:\/\/doi.org\/10.1101\/2024.10.15.24315557&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR9\" id=\"ref-link-section-d188851467e2321\" target=\"_blank\" rel=\"noopener\">9<\/a>, both using the same NMR platform. We considered a set of metabolic traits that were directly measured by the NMR platform and not inferred from other traits to avoid multiplicative errors in these more sensitive phenotypes. In total, we were able to match 144 (Karjalainen et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Karjalainen, M. K. et al. Genome-wide characterization of circulating metabolic biomarkers. Nature 628, 130&#x2013;138 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR3\" id=\"ref-link-section-d188851467e2325\" target=\"_blank\" rel=\"noopener\">3<\/a>) and 169 (Tambets et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Tambets, R. et al. Genome-wide association study for circulating metabolites in 619,372 individuals. Preprint at medRxiv &#010;                https:\/\/doi.org\/10.1101\/2024.10.15.24315557&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR9\" id=\"ref-link-section-d188851467e2329\" target=\"_blank\" rel=\"noopener\">9<\/a>) metabolic traits, for which we compared sentinel variants that passed metabolome-adjusted, genome-wide significance in our trans-ancestral meta-analysis and that overlapped between the studies.<\/p>\n<p>Causal gene assignment<\/p>\n<p>To assign candidate genes for all metabolite QTLs residing outside the MHC region, we first collected annotations for each genetic variant or proxies thereof (r2\u2009&gt;\u20090.6), including distance to the gene body and putative functional consequences based on the Variant Effect Predictor (VEP) tool offered by Ensembl. We further collated up to ten closest genes within a 2-Mb window and subsequent gene features such as: (1) eQTL evidence for a given variant\u2013gene pair for each tissue available in the eQTL Catalogue release 7<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 77\" title=\"Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290&#x2013;1299 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR77\" id=\"ref-link-section-d188851467e2345\" target=\"_blank\" rel=\"noopener\">77<\/a>; (2) evidence of being annotated as metabolic in the MGI or Orphanet databases as defined in ProGem<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Stacey, D. et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. 47, e3 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR19\" id=\"ref-link-section-d188851467e2349\" target=\"_blank\" rel=\"noopener\">19<\/a>; (3) evidence of being listed in the Online Mendelian Inheritance in Man (OMIM) database<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. &amp; Hamosh, A. O. M. I. M. org: Online Mendelian Inheritance in Man (OMIM&#xAE;), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789&#x2013;D798 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR39\" id=\"ref-link-section-d188851467e2353\" target=\"_blank\" rel=\"noopener\">39<\/a>; (4) and evidence of being an already assigned drug target in Open Targets<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 78\" title=\"Ochoa, D. et al. The next-generation Open Targets Platform: reimagined, redesigned, rebuilt. Nucleic Acids Res. 51, D1353&#x2013;D1359 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR78\" id=\"ref-link-section-d188851467e2357\" target=\"_blank\" rel=\"noopener\">78<\/a> clinical stages III and IV.<\/p>\n<p>With no universally accepted standard for variant-to-gene assignments, we relied on prior biological and genomic information to create three sets of \u2018putative true positive\u2019 (PTP) set: genes part of cholesterol pathway in the Kyoto Encyclopedia of Genes and Genomes (KEGG)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 79\" title=\"Kanehisa, M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27&#x2013;30 (2000).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR79\" id=\"ref-link-section-d188851467e2364\" target=\"_blank\" rel=\"noopener\">79<\/a> or REACTOME<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 80\" title=\"Milacic, M. et al. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res. 52, D672&#x2013;D678 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR80\" id=\"ref-link-section-d188851467e2368\" target=\"_blank\" rel=\"noopener\">80<\/a> database (n\u2009=\u20096,791, 722 unique SNPs), lipid pathway (n\u2009=\u20095,670, 603 unique SNPs) and amino acid-related pathway (n\u2009=\u20098,349, 895 unique SNPs). We used all fine-mapped SNPs associated with metabolites classified in the respective NMR metabolite class (Cholesterol: cholesterol, cholesteryl esters, free cholesterol; Lipid: total lipids, other lipids, relative lipid concentration, phospholipids; Amino Acid: amino acid) in the PTP set and used overlapping SNPs in only one PTP set. We trained (7:3 training:test ratio without overlapping variants) a random forest classifier using fivefold cross-validation with subsampling to account for the unbalanced datasets (scikit-learn v1.4.1). We used the balanced accuracy score to choose the best-performing forest from each training set. Subsequently, we used the best-performing classifier from each PTP set to assign candidate scores for all putative effector genes across the entire set of metabolite QTLs. We calculated the median score across classifiers and selected the highest-scoring gene per variant. Within each PTP set, we omitted features used to define true positive sets. Each of the three classifiers exhibited consistent performance (mean ROC-AUC: 0.80, mean balanced accuracy score 0.69) (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#MOESM1\" target=\"_blank\" rel=\"noopener\">7<\/a>). We used the sum across all three classifiers to assign effector gene scores but present only genes as potential effector genes that reached sufficient support as indicated by largest difference between consecutively prioritized genes.<\/p>\n<p>To provide another layer of evidence for assignment of causal genes at metabolic loci, we performed cis-colocalization with protein targets measured in the independent Fenland study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Pietzner, M. et al. Mapping the proteo-genomic convergence of human diseases. Science 374, eabj1541 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR22\" id=\"ref-link-section-d188851467e2391\" target=\"_blank\" rel=\"noopener\">22<\/a>. Cis (for example, gene body\u2009\u00b1\u2009500\u2009kb) summary statistics were preprocessed using MungeSumStats<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 81\" title=\"Murphy, A. E., Schilder, B. M. &amp; Skene, N. G. MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics. Bioinformatics 37, 4593&#x2013;4596 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR81\" id=\"ref-link-section-d188851467e2398\" target=\"_blank\" rel=\"noopener\">81<\/a>. To relax the single causal variant assumption, we used a colocalization approach where we fine-mapped all traits with SuSiE and then performed colocalization among all credible sets using functionality of the coloc (v5.2.3)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 82\" title=\"Wallace, C. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLoS Genet. 16, e1008720 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR82\" id=\"ref-link-section-d188851467e2402\" target=\"_blank\" rel=\"noopener\">82<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 83\" title=\"Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 17, e1009440 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR83\" id=\"ref-link-section-d188851467e2405\" target=\"_blank\" rel=\"noopener\">83<\/a> and susieR (v0.12.35)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 75\" title=\"Wang, G., Sarkar, A., Carbonetto, P. &amp; Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B 82, 1273&#x2013;1300 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR75\" id=\"ref-link-section-d188851467e2410\" target=\"_blank\" rel=\"noopener\">75<\/a> R packages. For this, we set the prior probability that a SNP is associated with both traits to 5\u2009\u00d7\u200910\u22126 and restricted the maximum number of credible sets for the outcome data to five<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 82\" title=\"Wallace, C. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLoS Genet. 16, e1008720 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR82\" id=\"ref-link-section-d188851467e2416\" target=\"_blank\" rel=\"noopener\">82<\/a>.<\/p>\n<p>Tissue enrichment of metabolic loci<\/p>\n<p>We tested whether genes proximal to metabolic loci and assigned effector genes were enriched in tissue compartments by leveraging data from the Human Protein Atlas<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 84\" title=\"Uhl&#xE9;n, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR84\" id=\"ref-link-section-d188851467e2428\" target=\"_blank\" rel=\"noopener\">84<\/a>. Specifically, we used a two-sided Fisher\u2019s test whether metabolic genes were enriched among tissue-specific genes (tissue-enriched or tissue-enhanced as defined by the Protein Atlas) against all protein-coding genes as background.<\/p>\n<p>Pleiotropy assignment and overlap with the GWAS Catalog<\/p>\n<p>To assign modes of pleiotropy for each mQTL, we first clumped lead credible set variants across NMR measures by LD, collating variants with r2\u2009\u2265\u20090.6 as a single signal, referred to hereafter as mQTL group. This was done based on dosage files of all unrelated British European UKB participants and implemented with the igraph (v.2.0.1.1) package in R. For each mQTL, we computed pairwise Pearson correlation coefficients among associated NMR measures. We classified each mQTL group on: (1) the 25th percentile of all pairwise correlations, and (2) the Pearson correlation coefficient between the association strengths for each measure (\u2212\u2009log10(P value)) and its correlation coefficient with the most strongly associated measure within the mQTL. The latter is a measure to what extent the association between NMR measures at a given locus (\u2018pleiotropy\u2019) can be explained by being correlated with the most proximal associated measure. Based on opposing those two measures for all mQTLs we defined the following five groups: (1) \u2018specific\u2019 mQTLs associated with only \u22643 highly correlated NMR measures (rho \u22650.6); (2) \u2018pathway pleiotropic\u2019 mQTLs associated with highly correlated NMR measures (rho \u22650.6) that followed the described association pattern (rho \u22650.6); (3) \u2018proportional pleiotropic\u2019 mQTL groups associated with, in part, uncorrelated NMR measures but highly correlated association statistics (rho \u22650.6); (4) \u2018disproportional pleiotropic\u2019 mQTLs associated with highly correlated NMR measures (rho \u22650.6), but without evidence that this translated into a correlation of association statistics (rho <\/p>\n<p>To quantify the extent to which our pleiotropy assignment extends beyond the NMR measures analyzed here, we intersected mQTLs and proxies thereof with results reported in the GWAS Catalog (downloaded 20 May 2024). We first pruned GWAS Catalog entries for those with mapped traits (to minimize double counting), results that met genome-wide significance (P\u2009\u22128) and had location information available. We further dropped results similar to NMR measures based on broad Experimental Factor Ontology (EFO) terms (for example, EFO:0005105 and child terms indicating \u2018lipid or lipoprotein measurement\u2019). To further account for traits mapping to similar categories, we iteratively traced back-mapped EFO terms to broader parent terms. We finally classified mQTLs to be \u2018specific\u2019 in the GWAS Catalog if they associated with fewer than five parent EFO terms and \u2018unspecific\u2019 otherwise.<\/p>\n<p>Integration with cardiovascular endpoints<\/p>\n<p>We next aimed to investigate the shared genetic basis of the 249 NMR and 25 selected CVD traits. We utilized public databases (GWAS Catalog, openGWAS, CVD-KP) to collect CVD data comprising the largest currently publicly available GWAS datasets on CAD and myocardial infarction, angina pectoris, aortic aneurysm, heart failure and stroke, and peripheral arterial disease, including two to five subtypes for some phenotypes (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#MOESM4\" target=\"_blank\" rel=\"noopener\">13<\/a>). Data were harmonized and, if necessary, lifted over to GRCh37 using the MungeSumstats (v1.13.2) R package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 81\" title=\"Murphy, A. E., Schilder, B. M. &amp; Skene, N. G. MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics. Bioinformatics 37, 4593&#x2013;4596 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR81\" id=\"ref-link-section-d188851467e2468\" target=\"_blank\" rel=\"noopener\">81<\/a>. We queried mQTL lead variants and proxies in strong LD (r2\u2009&gt;\u20090.8; LD backbone based on UKB, as described above) of each NMR trait in each region and corresponding summary statistics for each CVD trait.<\/p>\n<p>To investigate \u2018locus\u2019 effects, we performed statistical colocalization for all combinations of the NMR traits\u2013CVD traits as described before (see \u2018Causal gene assignment\u2019 section).<\/p>\n<p>To estimate \u2018level\u2019 effects of NMR metabolite concentrations on CVD outcomes, we performed Mendelian Randomization analysis using the TwoSampleMR package (v0.5.1), implementing the inverse-variance weighted and the MR-Egger methods. We used all 249 NMR metabolites as exposure variables, the 25 CVDs as outcome variables and assessed separately four sets of instruments: (1) sentinel variants, (2) lead credible set variants, (3) lead credible set variants restricted for molecular pleiotropy (for example, \u2018pathway pleiotropy\u2019) and (4) lead credible set variants restricted for both molecular and phenotypic pleiotropy. We used the Wald ratio method to estimate the effect of NMR concentrations on CVD outcomes using only single genetic variants<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 85\" title=\"Burgess, S., Small, D. S. &amp; Thompson, S. G. A review of instrumental variable estimators for Mendelian randomization. Stat. Methods Med. Res. 26, 2333&#x2013;2355 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR85\" id=\"ref-link-section-d188851467e2482\" target=\"_blank\" rel=\"noopener\">85<\/a>. We used MR-Egger to test for evidence of a pleiotropic association, an intercept P value &gt;0.0001 indicating evidence of no pleiotropy and checked for concordance between the effect estimates of inverse-variance weighted Mendelian randomisation (IVW-MR), MR-Egger and single genetic variant MR. We controlled the FDR at 5% (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 86\" title=\"Benjamini, Y. &amp; Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289&#x2013;300 (1995).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR86\" id=\"ref-link-section-d188851467e2489\" target=\"_blank\" rel=\"noopener\">86<\/a>). To further limit the possible extent of pleiotropic associations, we only reported \u2018level effects\u2019 passing these filters in the variant sets 2\u20134, prioritizing the association in the more stringent variant set.<\/p>\n<p>The overlap of \u2018locus effects\u2019 showing no \u2018disproportional pleiotropy\u2019 according to the section \u2018Pleiotropy assignment and overlap with the GWAS Catalog\u2019 as well as a significant single variant MR (FDR 5%) and \u2018level effects\u2019 calculated from metabolite-specific or metabolite- and phenome-specific variants was used to identify gene\u2013metabolite pairs associated with CVD risk independent of LDL metabolism. We considered loci as independent from LDL metabolism if they did not associate with clinical LDL cholesterol at the locus with P\u2009\u221210 and the effect estimate of any variant on clinical LDL-C ranked upward the 80th percentile of all effect estimates at the locus.<\/p>\n<p>Whole exome sequencing data QC for rare variant analyses<\/p>\n<p>An in-depth description of whole exome sequencing, including experimental details, variant calling and standard QC measures for the UKB has been extensively reported by Backman et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 87\" title=\"Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628&#x2013;634 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR87\" id=\"ref-link-section-d188851467e2510\" target=\"_blank\" rel=\"noopener\">87<\/a>. We performed additional QC steps at the UKB Research Analysis Platform (RAP; <a href=\"https:\/\/ukbiobank.dnanexus.com\/\" target=\"_blank\" rel=\"noopener\">https:\/\/ukbiobank.dnanexus.com\/<\/a>).<\/p>\n<p>We used bcftools (v1.15.1) to process population-level Variant Call Format (pVCF) files. Initially, we normalized the data using the reference sequence GRCh38 build, followed by splitting multiallelic variants. Subsequently, we conducted QC on these variants using a set of parameters outlined below to filter high-quality variants for downstream genetic analyses. Genotypes for SNPs were set to missing if the read depth was less than 7 (or less than 10 for INDELs) or if the genotype quality was below 20. Furthermore, we excluded variants if the allele balance was less than 0.25 or greater than 0.8 in heterozygous carriers. Finally, we excluded variants with missingness &gt;50%.<\/p>\n<p>Variant annotation and gene burden masks<\/p>\n<p>Variants were annotated using ENSEMBL VEP<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 88\" title=\"McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR88\" id=\"ref-link-section-d188851467e2532\" target=\"_blank\" rel=\"noopener\">88<\/a> (v106.1) with the most severe consequence for each variant chosen across all protein-coding transcripts. We further utilized additional plugins REVEL<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 89\" title=\"Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877&#x2013;885 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR89\" id=\"ref-link-section-d188851467e2536\" target=\"_blank\" rel=\"noopener\">89<\/a>, CADD v1.6<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 90\" title=\"Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310&#x2013;315 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR90\" id=\"ref-link-section-d188851467e2540\" target=\"_blank\" rel=\"noopener\">90<\/a> and LOFTEE<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 91\" title=\"Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434&#x2013;443 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR91\" id=\"ref-link-section-d188851467e2544\" target=\"_blank\" rel=\"noopener\">91<\/a> for variant annotation. Based on these scores, we defined six partially overlapping variant masks: (1) high-confidence predicted LoF (pLOF, based on LOFTEE and includes stop-gained, splice site disrupting, and frameshift variants); (2) any pLOF assigned high impact by VEP; (3) pLOF and high-impact missense variants (CADD score &gt;20 or REVEL score &gt;0.5); (4) pLOF and any missense variants; (5) only high-impact variants; and (6) any missense variants but not pLOF. We tested synonymous variants separately as a negative control. We tested each mask in different MAF bins, using 0.5% and 0.005% as thresholds.<\/p>\n<p>We performed rare variant association testing (RVAT) using whole exome sequencing (WES) data across 249 NMR phenotypes using REGENIE (v3.1.1) via the DNAnexus Swiss Army Knife tool (v4.9.1). Similar to common variant GWASs, we used a two-step approach by REGENIE. We additionally generated step 1 leave-one-chromosome-out (LOCO) files with and without adjusting for common signals via a polygenic score (PGS derived from all lead credible set variant per NMR trait) in the RVAT models per phenotype. All RVAT models were then adjusted for PGS in addition to age, biological sex, fasting duration and the first ten genetic PCs. We first performed aggregated gene burden testing across for 19,026 genes using a set of masks as defined above. For gene burden testing, we used the aggregated Cauchy association test to estimate P values for each gene across masks and allele frequency bins. The aggregated Cauchy association test first computes P values for all sets defined by various masks within a gene and then takes these P values as input to compute one P value for the respective gene via a well-approximated Cauchy distribution.<\/p>\n<p>We performed single variant association testing for exonic variants (ExWAS). For the ExWAS, we tested variants with MAC &gt;5 and reported results for variants with MAF <\/p>\n<p>We considered findings as robust if they passed multiple-testing-corrected statistical significance (gene burden: P\u2009\u22128 (corrected for the number of genes\u2009\u00d7\u2009number of traits); ExWAS: P\u2009\u221210 (same as for common variant GWAS, conventional genome-wide significance corrected for the number of traits)) in both the model with and without adjusting for the common variant PGS and effect sizes did not differ by more than 20% between these models, as this might otherwise indicate that rare variant findings cannot clearly be distinguished from common variant effects.<\/p>\n<p>Phenotype definition<\/p>\n<p>To systematically test for phenotypic consequences of genes identified through rare variant analysis, we collated 626 disease entities following previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 1\" title=\"Surendran, P. et al. Rare and common genetic determinants of metabolic individuality and their effects on human health. Nat. Med. 28, 2321&#x2013;2332 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#ref-CR1\" id=\"ref-link-section-d188851467e2587\" target=\"_blank\" rel=\"noopener\">1<\/a> by aggregating information from self-report, HES, death certificates and primary care data (45% of the UKB population). Each disease entity had at least one significant common variant, and we used a similar analysis workflow using REGENIE as described for NMR measures but using logistic regression with saddle point approximation.<\/p>\n<p>Integration of OMIM<\/p>\n<p>We downloaded the OMIM gene\u2013disease list (9 November 2023) and kept 7,327 unique entries after filtering for gene entries with high confidence (level 3). We computed the enrichment of genes associated with any NMR measure from rare variant or gene burden analysis against a background of 19,989 protein coding genes using Fisher\u2019s exact test.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02355-3#MOESM2\" target=\"_blank\" rel=\"noopener\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"Study design The UKB is a prospective cohort study from the UK that contains more than 500,000 volunteers&hellip;\n","protected":false},"author":2,"featured_media":474141,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3846],"tags":[3971,3973,3967,3970,8994,3972,3968,267,7189,3969,70,16,15],"class_list":{"0":"post-474140","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-agriculture","9":"tag-animal-genetics-and-genomics","10":"tag-biomedicine","11":"tag-cancer-research","12":"tag-epidemiology","13":"tag-gene-function","14":"tag-general","15":"tag-genetics","16":"tag-genome-wide-association-studies","17":"tag-human-genetics","18":"tag-science","19":"tag-uk","20":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115317351572424485","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/474140","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=474140"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/474140\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/474141"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=474140"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=474140"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=474140"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}