{"id":55284,"date":"2025-04-27T17:21:17","date_gmt":"2025-04-27T17:21:17","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/55284\/"},"modified":"2025-04-27T17:21:17","modified_gmt":"2025-04-27T17:21:17","slug":"integrating-the-environmental-and-genetic-architectures-of-aging-and-mortality","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/55284\/","title":{"rendered":"Integrating the environmental and genetic architectures of aging and mortality"},"content":{"rendered":"<p>Study design and participants<\/p>\n<p>The UKB is a prospective cohort study with extensive genetic and phenotype data available for 502,505 individuals resident in the UK<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203&#x2013;209 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR36\" id=\"ref-link-section-d344843148e2348\" target=\"_blank\" rel=\"noopener\">36<\/a>. The full UKB protocol<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Protocol for a Large-Scale Prospective Epidemiological Resource (UK Biobank, 2007); &#010;                  https:\/\/www.ukbiobank.ac.uk\/media\/gnkeyh2q\/study-rationale.pdf&#010;                  &#010;                \" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR37\" id=\"ref-link-section-d344843148e2352\" target=\"_blank\" rel=\"noopener\">37<\/a> is available online. All statistical analyses were carried out using R v.4.2.2. and PLINK v.2.0.<\/p>\n<p>Exposures<\/p>\n<p>We considered all non-genetic variables available as of 24 July 2020 that were collected or derived (for example, air pollution and Townsend deprivation index) at baseline, had Supplementary Information and Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">9<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">10<\/a>), 176 unique exposures remained that were available in the full cohort and were common to both women and men. All continuous exposure variables were centered and standardized before analysis, except for age at recruitment. All ordinal categorical variables were recoded to only test linear associations and other polynomial contrasts (for example, quadratic or cubic associations) were not assessed. All nominal categorical exposures were analyzed with the most common category set as the reference. All \u2018mark all that apply\u2019 questions were recoded as binary dummy variables. Detailed data dictionaries including all exposures used in imputation and XWAS steps are included in Supplementary Files <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM3\" target=\"_blank\" rel=\"noopener\">1<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM3\" target=\"_blank\" rel=\"noopener\">2<\/a>.<\/p>\n<p>Outcomes<\/p>\n<p>Detailed information about the linkage procedure<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Mortality Data: Linkage to Death Registries (UK Biobank, 2023); &#010;                  https:\/\/biobank.ctsu.ox.ac.uk\/crystal\/refer.cgi?id=115559&#010;                  &#010;                \" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR38\" id=\"ref-link-section-d344843148e2387\" target=\"_blank\" rel=\"noopener\">38<\/a> with national registries for mortality and cause of death information is available online. Mortality data were accessed from the UKB data portal on 4 May 2022, with a censoring date of 30 September 2021 or 31 October 2021 for participants recruited in England\/Scotland or Wales, respectively (11\u201315\u2009years of follow-up).<\/p>\n<p>Procedures for calculating proteomic aging in the UKB were described previously<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Argentieri, M. A. et al. Proteomic aging clock predicts mortality and risk of common age-related diseases in diverse populations. Nat. Med. 30, 2450&#x2013;2460 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR19\" id=\"ref-link-section-d344843148e2394\" target=\"_blank\" rel=\"noopener\">19<\/a>. Aging biomarkers (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">6<\/a>) were measured using baseline nonfasting blood serum samples as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Elliott, P. &amp; Peakman, T. C. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int. J. Epidemiol. 37, 234&#x2013;244 (2008).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR39\" id=\"ref-link-section-d344843148e2401\" target=\"_blank\" rel=\"noopener\">39<\/a>. Data on leukocyte telomere length were only available in a slightly smaller sample (n\u2009=\u2009472,506) than other biomarkers and were not imputed. Biomarkers were previously adjusted for technical variation by the UKB, with sample processing<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\" title=\"UK Biobank Biomarker Project. Companion Document to Accompany Serum Biomarker Data (UK Biobank, 2019); &#010;                  https:\/\/biobank.ndph.ox.ac.uk\/showcase\/showcase\/docs\/serum_biochemistry.pdf&#010;                  &#010;                \" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR40\" id=\"ref-link-section-d344843148e2408\" target=\"_blank\" rel=\"noopener\">40<\/a> and quality control<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Biomarker Assay Quality Procedures: Approaches Used to Minimise Systematic and Random Errors (and the Wider Epidemiological Implications) (UK Biobank, 2019); &#010;                  https:\/\/biobank.ndph.ox.ac.uk\/showcase\/ukb\/docs\/biomarker_issues.pdf&#010;                  &#010;                \" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR41\" id=\"ref-link-section-d344843148e2413\" target=\"_blank\" rel=\"noopener\">41<\/a> procedures described on the UKB website.<\/p>\n<p>Data used to define prevalent and incident cases for chronic diseases and common disease risk factors are outlined in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">8<\/a>. Incident chronic disease diagnoses were ascertained using International Classification of Diseases (ICD) diagnosis codes and corresponding dates of diagnosis taken from linked hospital inpatient records and death register data. ICD-9 and ICD-10 data were accessed from the UKB data portal on 30 May 2022, with a censoring date of 30 September 2021, 31 July 2021 or 28 February 2018 for participants recruited in England, Scotland or Wales, respectively (8\u201315\u2009years of follow-up). Breast, ovarian and prostate cancer analyses were carried out as sex-specific analyses in female (breast and ovarian) or male (prostate) participants.<\/p>\n<p>Missing data imputation<\/p>\n<p>The average percentages of missing data across all final variables included in our UKB analysis datasets were 11% in women (range: 0\u201379%) and 10.9% in men (range: 0\u201377%). UKB participants recruited from England were randomly assigned to a discovery (n\u2009=\u2009218,446) or replication set (n\u2009=\u2009218,445) while maintaining the same proportion of mortality cases in each. We performed missing data imputation separately in the discovery, replication and Scottish\/Welsh validation (n\u2009=\u200955,676) datasets using the R package missRanger<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Mayer, M. missRanger: fast imputation of missing values. R package version 2.1.0 &#010;                  https:\/\/CRAN.R-project.org\/package=missRanger&#010;                  &#010;                 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR42\" id=\"ref-link-section-d344843148e2440\" target=\"_blank\" rel=\"noopener\">42<\/a>, which combines random forest imputation with predictive mean matching. We imputed five datasets, with a maximum of ten iterations for each imputation. We set the maximum number of trees for the random forest to 200, but left all other random forest hyperparameters at their default. The variables used as predictors in the imputation included all baseline, non-nested variables, the Nelson\u2013Aalen estimate of cumulative mortality hazard and the all-cause mortality event indicator. All subsequent study analyses were run independently in each of the five imputed datasets, and results were pooled using Rubin\u2019s rule<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Marshall, A., Altman, D., Holder, R. &amp; Royston, P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med. Res. Methodol. 9, 57 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR43\" id=\"ref-link-section-d344843148e2444\" target=\"_blank\" rel=\"noopener\">43<\/a>.<\/p>\n<p>XWAS<\/p>\n<p>XWAS of all-cause mortality were initially carried out separately in women and men, and then a final XWAS was calculated in the pooled dataset with both women and men to increase power. Exposures in the final pooled XWAS were limited to those applicable to both women and men, omitting sex-specific reproductive factors (only tested in the sex-specific XWAS). In each XWAS, we serially assessed associations of each individual exposure with all-cause mortality using Cox proportional hazards models with age as the timescale stratified by 5-year birth cohorts and sex (in the pooled analysis only), and adjusted for assessment center, years of education (7\u2009years, 10\u2009years, 13\u2009years, 15\u2009years, 19\u2009years and 20\u2009years) and ethnicity (white, Asian, Black, mixed or other). For each model, the baseline hazards were calculated separately in each of these strata, and resulting effect estimates are those that fit best across all strata. Since it has been shown that UKB participants are likely to misreport alcohol consumption as a function of higher disease burden<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Xue, A. et al. Genome-wide analyses of behavioural traits are subject to bias by misreports and longitudinal changes. Nat. Commun. 12, 6450 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR44\" id=\"ref-link-section-d344843148e2457\" target=\"_blank\" rel=\"noopener\">44<\/a>, self-reported overall health status was added as an additional XWAS covariate for the self-reported alcohol intake exposure only. P values in the discovery and replication analyses were corrected using the FDR (Benjamini\u2013Hochberg method<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Benjamini, Y. &amp; Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289&#x2013;300 (1995).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR45\" id=\"ref-link-section-d344843148e2464\" target=\"_blank\" rel=\"noopener\">45<\/a>) with a significance threshold of FDR P\u2009n\u2009=\u2009436,891) to complete further sensitivity analyses.<\/p>\n<p>Prevalent disease sensitivity analysis<\/p>\n<p>We conducted a sensitivity analysis in the full sample of participants recruited in England (n\u2009=\u2009436,891) where we individually tested every exposure replicated in the pooled mortality XWAS again in relation to mortality using the same XWAS formula and covariates, but now adding an interaction term between each exposure and an indicator of baseline disease or poor health (see definition below). We flagged and removed from further analysis any exposure that no longer had a significant direct effect in this model (P\u2009P\u2009<\/p>\n<p>The baseline disease\/poor health indicator was created for all participants, in which participants were coded as having disease or poor health at baseline if they (1) had a linked hospital inpatient ICD diagnosis for any of the chronic illnesses or common disease risk factors studied in our analysis (including hypertension, dyslipidemia and obesity) with a diagnosis date before or on their date of recruitment to the UKB; (2) were assigned a diagnosis code for any of the chronic diseases or common disease risk factors studied in our analysis during the baseline clinical interview (field IDs 20001 and 20002 in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">8<\/a>); (3) self-reported a physician diagnosis of heart attack (field ID 6150), angina (field ID 6150), stroke (field ID 6150), high blood pressure (field ID 6150), bronchitis\/emphysema (field ID 6152), diabetes (field ID 2443) or cancer (field ID 2453); (4) self-reported \u22651 cancer diagnoses (field ID 134); (5) self-reported taking insulin medication (field IDs 6153 and 6177), cholesterol lowering medication (field IDs 6153 and 6177) or blood pressure medication (field IDs 6153 and 6177); or (6) self-reported their overall health status as \u2018poor\u2019 (field ID 2178).<\/p>\n<p>PheWAS of replicated exposures<\/p>\n<p>For all exposures replicated in the XWAS and not removed during the above-described disease sensitivity analyses, a PheWAS was conducted. In each PheWAS, the exposure was used as the outcome variable (hereafter referred to as exposure outcomes) and was tested against the full set of baseline phenotypes available in the UKB (Supplementary File <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM3\" target=\"_blank\" rel=\"noopener\">62<\/a> provides the full list of phenotypes tested). Each PheWAS was conducted as a linear or logistic regression, depending on whether the exposure outcome was continuous or categorical, with covariates for age at recruitment and sex. All ordinal exposure outcomes were tested as continuous variables. Nominal categorical exposure outcomes were recoded into dummy variables for each response category versus the reference. All continuous phenotype exposures were scaled and centered to the mean before running the PheWAS. Summary statistics from all PheWAS are available in Supplementary Files <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM3\" target=\"_blank\" rel=\"noopener\">63<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM3\" target=\"_blank\" rel=\"noopener\">178<\/a>.<\/p>\n<p>Proteomic age clock analyses<\/p>\n<p>We serially assessed associations between each exposure and proteomic age gap (the difference in years between plasma protein-predicted age and calendar age) using cross-sectional linear regression models with covariates for sex, age at recruitment, assessment center, years of education and ethnicity. In brief, we previously developed a proteomic age clock in a subset of UKB participants (n\u2009=\u200945,441) using a gradient boosting machine learning model that takes 204 proteins we identified and uses them to accurately predict chronological age (Pearson r\u2009=\u20090.94)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Argentieri, M. A. et al. Proteomic aging clock predicts mortality and risk of common age-related diseases in diverse populations. Nat. Med. 30, 2450&#x2013;2460 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR19\" id=\"ref-link-section-d344843148e2528\" target=\"_blank\" rel=\"noopener\">19<\/a>. In a validation study involving biobanks in China (n\u2009=\u20093,977) and Finland (n\u2009=\u20091,990), the proteomic age clock showed similar age prediction accuracy (Pearson r\u2009=\u20090.92 and r\u2009=\u20090.94, respectively) compared with its performance in the UKB. The proteomic age clock has been previously associated with the incidence of 18 major chronic diseases (including diseases of the heart, liver, kidney and lung, diabetes, neurodegeneration and cancer), as well as with multimorbidity and all-cause mortality risk.<\/p>\n<p>Correlation and cluster analyses<\/p>\n<p>Correlation between all variables was calculated in the full sample of participants recruited in England using the R package polycor<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Fox, J. polycor: polychoric and polyserial correlations. R package version 0.7-10 &#010;                  https:\/\/CRAN.R-project.org\/package=polycor&#010;                  &#010;                 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR46\" id=\"ref-link-section-d344843148e2553\" target=\"_blank\" rel=\"noopener\">46<\/a> to create a heterogeneous correlation matrix for each imputed dataset. Correlation coefficients were first calculated within each imputed dataset, transformed to a normally distributed z-score via Fisher\u2019s z transformation, pooled via Rubin\u2019s rule and then retransformed back to the original r-scale coefficient after pooling. We used hierarchical clustering via Euclidean distance to identify the cluster structure of exposures replicated in the pooled XWAS and not susceptible to reverse causation bias (plus education and ethnicity). We used within-cluster sum of squares (WSS) analyses to identify candidates for the optimal number of clusters. We first computed the hierarchical clustering of exposures for different numbers of clusters (k) ranging from 1 to 25. For each k, we then calculated the WSS. We plotted the WSS as a function of the number of clusters k, and examined the plot visually to find the elbow in the plot (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">2<\/a>). We determined that a seven cluster solution was the best approximation of the elbow in the WSS curve and represented the most appropriate conceptual groupings of exposures. When visually inspecting the dendrogram of hierarchical correlation, seven clusters separate the variables very well in terms of breaking variables into discrete groups with large distances\/heights between clusters.<\/p>\n<p>We further conducted multivariable modeling within each of these seven clusters using the following procedure: (1) all exposures in the cluster were run in a single multivariable mortality Cox model to check for multicollinearity using the variance inflation factor. Exposures with a generalized variance inflation factor(1\/(2\u00d7d.f.)) &gt;1.6 were flagged for collinearity and removed. (2) After any collinear variables are removed, all remaining exposures in the cluster were tested together in a single multivariable mortality Cox model using age as the timescale, stratified by 5-year birth cohorts and sex, and adjusted for UKB assessment center, household income (less than \u00a318,000, \u00a318,000\u2013\u00a330,999, \u00a331,000\u2013\u00a351,999, \u00a352,000\u2013\u00a3100,000, greater than \u00a3100,000), education and ethnicity (if those variables were not already in the cluster). Significance in all the cluster multivariable models was determined by a nominal P\u2009<\/p>\n<p>Aging mechanisms and incident chronic disease analyses<\/p>\n<p>Aging biomarker variables (more details in Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">6<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">7<\/a>) were log transformed and then were age-adjusted by regressing each onto age at recruitment separately in women and men. Across exposures replicated in the XWAS and passing all sensitivity tests, we serially assessed associations between each exposure and age-adjusted biomarker using cross-sectional linear regression models with covariates for sex, 5-year birth cohort, assessment center, years of education, ethnicity, number of medications, smoking status (current, previous or never) and IPAQ physical activity level (low, moderate or high). Insulin-like growth factor 1 (IGF-1), leukocyte telomere length and vitamin D models included additional covariates for standing height (in cm), leukocyte count (109 cells per liter) and month of biomarker assessment (to control for seasonality of sun exposure), respectively.<\/p>\n<p>For chronic disease analyses, we serially assessed associations between each exposure (replicated in the mortality XWAS and surviving the disease sensitivity and cluster modeling stages) and incident disease using a Cox proportional hazards model, with all XWAS covariates plus household income, smoking status and IPAQ physical activity group. Sex-specific reproductive exposures (for example, menopause) replicated in the female- and male-only XWAS analyses were also tested as exposures in analyses of sex-specific chronic disease outcomes (breast, ovarian and prostate cancer).<\/p>\n<p>For common disease risk factors (obesity, hypertension and dyslipidemia), we serially assessed each exposure and risk factor pair using cross-sectional logistic regression models adjusted for age, sex, assessment center, household income, years of education, ethnicity, smoking status and IPAQ physical activity level.<\/p>\n<p>Across all biomarker, chronic disease, and common disease risk factor analyses, P values were corrected separately for each outcome using FDR.<\/p>\n<p>Calculating PRS<\/p>\n<p>Where possible, we used multiancestry PRS that were previously made available by the UKB (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM1\" target=\"_blank\" rel=\"noopener\">11<\/a>). Methods for deriving these PRS are described elsewhere<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 47\" title=\"Thompson, D. J. et al. A systematic evaluation of the performance and properties of the UK Biobank Polygenic Risk Score (PRS) Release. PLoS ONE 19, e0307270 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR47\" id=\"ref-link-section-d344843148e2627\" target=\"_blank\" rel=\"noopener\">47<\/a>. For cancer outcomes where no PRS were provided by the UKB, we identified recent PRS from the Polygenic Score (PGS) catalog<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 48\" title=\"Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420&#x2013;425 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR48\" id=\"ref-link-section-d344843148e2631\" target=\"_blank\" rel=\"noopener\">48<\/a>, selecting scores derived in predominantly European populations that did not overlap with the UKB cohort (as no multiancestry scores were available). We calculated these PRS as weighted sums, \u2211(no. risk alleles\u2009\u00d7\u2009effect size) in the UKB v3 imputed genotype data. PGS catalog entries used to calculate PRS were as follows: leukemia (PGS000077) by Graff et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Graff, R. E. et al. Cross-cancer evaluation of polygenic risk scores for 16 cancer types in two large cohorts. Nat. Commun. 12, 970 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR49\" id=\"ref-link-section-d344843148e2635\" target=\"_blank\" rel=\"noopener\">49<\/a>, lung cancer (PGS000078) by Graff et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Graff, R. E. et al. Cross-cancer evaluation of polygenic risk scores for 16 cancer types in two large cohorts. Nat. Commun. 12, 970 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR49\" id=\"ref-link-section-d344843148e2639\" target=\"_blank\" rel=\"noopener\">49<\/a>, pancreatic cancer (PGS000083) by Graff et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Graff, R. E. et al. Cross-cancer evaluation of polygenic risk scores for 16 cancer types in two large cohorts. Nat. Commun. 12, 970 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR49\" id=\"ref-link-section-d344843148e2644\" target=\"_blank\" rel=\"noopener\">49<\/a>, esophageal cancer (PGS002298) by Choi et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Choi, J., Jia, G., Wen, W., Long, J. &amp; Zheng, W. Evaluating polygenic risk scores in assessing risk of nine solid and hematologic cancers in European descendants. Int J. Cancer 147, 3416&#x2013;3423 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR50\" id=\"ref-link-section-d344843148e2648\" target=\"_blank\" rel=\"noopener\">50<\/a>, COPD score (PGS001788) by Wang et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Wang, Y. et al. Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts. Cell Genom. 3, 100241 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR51\" id=\"ref-link-section-d344843148e2652\" target=\"_blank\" rel=\"noopener\">51<\/a>, chronic kidney disease (PGS000859) by Mansour Aly et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Mansour Aly, D. et al. Genome-wide association analyses highlight etiological differences underlying newly defined subtypes of diabetes. Nat. Genet. 53, 1534&#x2013;1542 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR52\" id=\"ref-link-section-d344843148e2656\" target=\"_blank\" rel=\"noopener\">52<\/a>, nonalcoholic fatty liver disease (PGS002282) by Schnurr et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Schnurr, T. M. et al. Interactions of physical activity, muscular fitness, adiposity, and genetic risk for NAFLD. Hepatol. Commun. 6, 1516&#x2013;1526 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR53\" id=\"ref-link-section-d344843148e2660\" target=\"_blank\" rel=\"noopener\">53<\/a>, liver cirrhosis (PGS000726) by Emdin et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Emdin, C. A. et al. Association of genetic variation with cirrhosis: a multi-trait genome-wide association and gene-environment interaction study. Gastroenterology 160, 1620&#x2013;1633.e13 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR54\" id=\"ref-link-section-d344843148e2664\" target=\"_blank\" rel=\"noopener\">54<\/a> and knee osteoarthritis (PGS002729) by Sedaghati-Khayat et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Sedaghati-Khayat, B. et al. Risk assessment for hip and knee osteoarthritis using polygenic risk scores. Arthritis Rheumatol. 74, 1488&#x2013;1496 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR55\" id=\"ref-link-section-d344843148e2669\" target=\"_blank\" rel=\"noopener\">55<\/a>. All variants in these scores met our quality control criteria of imputation information &gt;0.4 and minor allele frequency &gt;0.005 in the UKB data. Although these new PRS were mostly developed in European populations, we calculated the PRS for our full multiancestry sample and accepted the limitation that the PRS may be slightly misspecified in non-European participants. All PRS were calculated using PLINK version 2.0.<\/p>\n<p>All PRS were coded as quintiles for use in our multivariable models. In all multivariable models including PRS variables, we also added an additional covariate for genotype array (BiLEVE versus Axiom; field ID 22000) as well as the first 20 genetic principal components published by the UKB (field ID 22009).<\/p>\n<p>Exposome and polygenic risk multivariable models<\/p>\n<p>For each outcome, five multivariable models were calculated. The first only includes age (scaled) and sex in the model (model 1). Model 2 includes age, sex and the PRS for the outcome, if available (see below for more detail). Model 3 includes age, sex and all exposures associated with the outcome (exposome). Model 4 includes age, sex, exposome and PRS. If a PRS was not available for a particular outcome, then models 2 and 4 were not calculated for that outcome. Each model was validated in the independent Scottish\/Welsh dataset (n\u2009=\u200955,676) by obtaining the linear predicted values from the models in the English dataset and measuring the C-index and R2 for these values in relation to the outcome rates in the Scottish\/Welsh population. For sex-specific outcomes (breast, ovarian and prostate cancers), we also included in the exposome all sex-specific exposures that were replicated in the female- and male-only mortality XWAS.<\/p>\n<p>The Cox proportional hazards models used for these multivariable models differed slightly from those used in previous analyses, instead using time in study as the timescale, using recruitment age and sex as fixed covariates, and removing the 5-year birth cohort covariate from the model given its collinearity with age. In all multivariable Cox models, the proportional hazards assumption was tested by examining the Schoenfeld residuals, and an interaction with time was added to any variable with nonproportional hazards. Survival time splitting to use for time interactions in these models was performed using the timeSplitter function from the Greg R package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Gordon, M. &amp; Seifert, R. Greg: regression helper functions. R package version 1.4.0 &#010;                  https:\/\/CRAN.R-project.org\/package=Greg&#010;                  &#010;                 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR56\" id=\"ref-link-section-d344843148e2694\" target=\"_blank\" rel=\"noopener\">56<\/a>, using 2\u2009years as the interval for time splitting. Any categorical exposure with less than ten outcome cases for one of the response levels was completely excluded from all exposome models for that specific outcome. The only exception was the variable on type of accommodation lived in, where instead we recoded all responses of \u2018mobile or temporary structure (that is, caravan)\u2019 to NA and removed that as a response level from the variable (since only a few hundred people endorsed this response level in the subset of participants in the multivariable models).<\/p>\n<p>The R2 values for each model were calculated using the CoxR2 package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"You, H. &amp; Xu, R. CoxR2: R-squared measure based on partial LR statistic, for the Cox PH regression model. R package version 1.0 &#010;                  https:\/\/CRAN.R-project.org\/package=CoxR2&#010;                  &#010;                 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR57\" id=\"ref-link-section-d344843148e2705\" target=\"_blank\" rel=\"noopener\">57<\/a> as a measure of explained randomness based on the partial likelihood ratio statistic under the Cox proportional hazard model<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"O&#x2019;Quigley, J., Xu, R. &amp; Stare, J. Explained randomness in proportional hazards models. Stat. Med. 24, 479&#x2013;489 (2005).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR58\" id=\"ref-link-section-d344843148e2709\" target=\"_blank\" rel=\"noopener\">58<\/a>. Following previous guidance<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Harel, O. The estimation of R2 and adjusted R2 in incomplete data sets using multiple imputation. J. Appl. Stat. 36, 1109&#x2013;1118 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR59\" id=\"ref-link-section-d344843148e2713\" target=\"_blank\" rel=\"noopener\">59<\/a>, R2 values were first calculated separately within each imputed dataset, converted to r-scale coefficients by taking the square root and then converted to the z-scale using Fisher\u2019s z transformation. The z-transformed R2 values were then averaged across all five imputed datasets. These averaged values were then retransformed back to the r-scale using inverse z transformation and then squared to return a pooled R2 value. C-index values were also pooled using the same method. Relative importance for each variable and category of variables within the multivariable models was calculated using Wald \u03a72 statistics via analysis of variance (ANOVA) using the rms package in R (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Harrell, F. E. Jr rms: regression modeling strategies. R package version 6.2-0 &#010;                  https:\/\/CRAN.R-project.org\/package=rms&#010;                  &#010;                 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#ref-CR60\" id=\"ref-link-section-d344843148e2754\" target=\"_blank\" rel=\"noopener\">60<\/a>), where the relative importance of each is the proportion of the variable\/group \u03a72 relative to the total model \u03a72.<\/p>\n<p>Ethics approval<\/p>\n<p>UKB data use (project application no. 61054) was approved by the UKB according to their established access procedures. The UKB has approval from the North West Multi-center research ethics committee as a Research Tissue Bank, and as such researchers using UKB data do not require separate ethical clearance and can operate under the Research Tissue Bank approval.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41591-024-03483-9#MOESM2\" target=\"_blank\" rel=\"noopener\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"Study design and participants The UKB is a prospective cohort study with extensive genetic and phenotype data available&hellip;\n","protected":false},"author":2,"featured_media":55285,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3846],"tags":[3967,3970,7371,8994,3968,267,6552,19313,29323,19314,19315,29324,2753,70,16,15],"class_list":{"0":"post-55284","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-biomedicine","9":"tag-cancer-research","10":"tag-diagnosis","11":"tag-epidemiology","12":"tag-general","13":"tag-genetics","14":"tag-infectious-diseases","15":"tag-metabolic-diseases","16":"tag-metabolic-disorders","17":"tag-molecular-medicine","18":"tag-neurosciences","19":"tag-predictive-markers","20":"tag-risk-factors","21":"tag-science","22":"tag-uk","23":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114411084676823192","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/55284","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=55284"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/55284\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/55285"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=55284"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=55284"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=55284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}