The MULTI Consortium

The MULTI Consortium is an ongoing initiative to integrate and consolidate existing multi-organ and multi-omics data, including imaging, genetics, metabolomics and proteomics. Building on existing consortia and studies, MULTI aims to curate and harmonize the data to model human ageing and disease at scale across the lifespan. Refer to Supplementary Table 1 for comprehensive information, including the complete list of data analysed and their respective sample characteristics. The participants provided written informed consent to the corresponding studies. The MULTI Consortium is approved by the Institutional Review Board at Columbia University (AAAV6751).

UKBB

UKBB46 is a population-based research initiative comprising around 500,000 individuals from the United Kingdom between 2006 and 2010. Ethical approval for the UKBB study has been secured, and information about the ethics committee can be found online (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/governance/ethics-advisory-committee). The main sleep data used in this study were sleep duration (field ID: 1160) based on a self-reported questionnaire collected from all 500,000 participants at the baseline. The 7 brain MRIBAGs were derived from multi-organ MRI data at the second visit, 11 ProtBAGs and 5 MetBAGs were derived using plasma proteomics and metabolomics at the baseline. Finally, we also included individual plasma proteins and metabolites in our ProWAS and MetWAS, as described below.

For the primary variable of interest in the UKBB, sleep duration (data-field: 1160) was assessed using an ACE touchscreen questionnaire asking “About how many hours sleep do you get in every 24 h? (please include naps).” The participants entered a numeric value, which underwent basic quality control: responses of less than 1 h or more than 23 h were rejected, and values below 3 h or above 12 h triggered a confirmation prompt. If participants clicked the ‘Help’ button, they were instructed that, if their sleep duration varied substantially, they should report the average number of hours slept in a 24-hour day over the past 4 weeks. For this variable, the value −1 indicates ‘Do not know’ and the value −3 indicates ‘Prefer not to answer’; these were excluded in the current work.

For the multi-organ IDPs, we used multi-organ MRI data from eight organ systems and tissues (category ID: 100003), including the brain, heart, liver, pancreas, spleen, adipose and kidney, as well as eye OCT features. The MUSE atlas-derived brain IDPs from the T1-weighted MRI10 were used for the brain MRIBAG generation. We also used neural networks to analyse the raw cardiac MRI images in our previous study and returned them to the UKBB to extract heart-specific IDPs (category ID: 157), which were used to derive the heart MRIBAG. For the other organs’ IDPs, we used the pre-derived features from the UKBB (category ID: 105). For the plasma proteomics data, we downloaded the original data (category ID: 1838), which were analysed and made available to the community by the UKBB Pharma Proteomics Project (UKB-PPP)47. The initial quality control procedures were described in the original work48; we conducted additional quality-check steps as outlined in the ‘Proteome-wide associations’ section. We also imputed missing normalized protein expression values and defined the organ-specific proteins using the HPA platform (https://www.proteinatlas.org/), as detailed in our previous work. For the plasma metabolomics data, we downloaded the original data (category ID: 220), which were analysed and made available to the community by Nightingale Health in the UKBB. Additional quality check analyses were performed as detailed in the ‘Metabolome-wide associations’ section.

FinnGen

The FinnGen49 study is a large-scale genomics initiative that has analysed over 500,000 Finnish biobank samples and correlated genetic variation with health data to understand disease mechanisms and predispositions. The project is a collaboration between research organizations and biobanks within Finland and international industry partners. For the benefit of research, FinnGen generously made its GWAS findings accessible to the wider scientific community (https://www.finngen.fi/en/access_results). This research used the publicly released GWAS summary statistics (version R9), which became available on 11 May 2022, after harmonization by the consortium. No individual data were used in the current study.

FinnGen published the R9 version of GWAS summary statistics via REGENIE software (v.2.2.4)50, covering 2,272 DEs, including 2,269 binary traits and 3 quantitative traits. The GWAS model included covariates such as age, sex, the initial 10 genetic principal components and the genotyping batch. Genotype imputation was referenced on the population-specific SISu v.4.0 panel. We included GWAS summary statistics for 521 FinnGen DEs in our analyses (Supplementary Table 7).

PGC

PGC51 is an international collaboration of researchers studying the genetic basis of psychiatric disorders. PGC aims to identify and understand the genetic factors contributing to various psychiatric disorders such as schizophrenia, bipolar disorder, MDD and others. The GWAS summary statistics were acquired from the PGC website (https://pgc.unc.edu/for-researchers/download-results/), underwent quality checks and were harmonized to ensure seamless integration into our analysis. No individual data were used from PGC. Each study detailed its specific GWAS models and methodologies, and the consortium consolidated the release of GWAS summary statistics derived from individual studies. In the current study, we included summary data for 6 brain diseases (Supplementary Table 7).

TriNetX

To evaluate real-world clinical outcomes associated with sleep traits, we used the TriNetX52 database (https://trinetx.com/)—a global federated health research platform that provides access to deidentified electronic medical records from over 70 healthcare organizations, encompassing more than 90 million patients. The TriNetX platform integrates clinical data, including diagnoses, medications, procedures and laboratory results, enabling large-scale observational analyses. We used this resource to assess associations between insomnia and hypersomnia and systemic disease outcomes across organ systems identified in the UKBB for short and long sleep duration (Fig. 3).

Baltimore longitudinal study of ageing

The main goal of the BLSA is to understand the normal ageing process. Tracking physiological and cognitive changes over time aims to identify risk factors for age-related diseases, study patterns of decline and identify predictors of healthy ageing. BLSA53,54 brain MRI, self-reported, and actigraphy-derived sleep duration (n = 385) were used to compare and replicate the U-shaped pattern observed in the main analysis for the brain MRIBAG. For self-reported sleep duration, sleep duration was assessed using a standardized questionnaire item asking “On average, in the past month, how many hours of sleep did you get each night?”. The participants select from ordered categorical response options reflecting typical nightly sleep duration: more than 7 h; more than 6 h up to 7 h; more than 5 h up to 6 h; or 5 h or fewer. This measure captures habitual sleep duration over the previous month and represents participants’ perceived average nightly sleep. We also included participants who underwent wrist actigraphy. They were instructed to wear an Actiwatch-2 wrist actigraphy (Philips-Respironics) on their non-dominant wrist for seven consecutive 24 h periods. The devices continuously recorded activity counts and ambient light levels, and the participants were asked to press the event marker each time they intended to go to sleep (lights out) and when they got up to start their day55.

The Multi-Ethnic Study of Atherosclerosis

The Multi-Ethnic Study of Atherosclerosis (MESA)21,22,56 is a medical research initiative involving over 6,000 men and women from six US communities. For this analysis, we included 573 participants with available brain MRI data, self-reported sleep duration measures, and relevant covariates such as age, sex and BMI. We leveraged the MESA cohort to attempt replication of the U-shaped association observed in the UKBB for the brain MRIBAG. Self-reported sleep duration is obtained from the Exam 5 Sleep Questionnaire, in which the participants report their habitual bedtimes and wake times separately for weekdays and weekends. The questionnaire includes items asking for bedtime and waketime in 24 h format, from which MESA derives weekday sleep duration and weekend sleep duration expressed in hours and minutes.

GAM models the relationship between sleep duration and organ ageing clocks, IDPs, proteins and metabolites

To model the association between sleep duration and the 23 multi-organ BAGs and any other phenotype, we implemented GAMs using the mgcv package in R. This approach enabled flexible modelling of complex relationships, whether linear, flat, sigmoidal or U-shaped, without requiring any prior assumptions about the underlying shape of the fitted curves. We adjusted for key demographic and physiological covariates (that is, age, sex, weight, height, waist circumference, BMI, assessment centre, diastole, systolic blood pressure, time differences for data collection (for MRIBAGs) and disease status). The participants reporting extremely short durations or those with missing sleep duration were excluded. The analysis was restricted to those reporting 4–10 h of sleep to reduce the influence of outliers. For each BAG, we fitted GAM with cubic regression splines (bs = ‘cr’) to smooth the nonlinear association between sleep duration and the BAG, considering sex and sex–sleep interaction term.

Model selection was conducted by systematically evaluating combinations of smoothing dimensions (k = 3, 5, 10, 15, 20) and distribution families (Gaussian, t, and gamma). For each candidate model, the estimated e.d.f. and smoothing parameters were optimized internally through penalized maximum-likelihood estimation. The optimal model was defined as the one yielding the lowest Akaike information criterion, indicating the best balance between model fit and complexity, while ensuring that the e.d.f. did not approach the upper limit of the specified k, thereby avoiding overfitting. We tested (1) the main effect (sleep duration; P1); (2) sex difference in population mean (P2); and (3) sex–sleep interaction terms (P3) on each BAG. The solid curves in Fig. 1 depict estimated BAG, while shaded bands represent the 95% CI. Supplementary Fig. 1 provides diagnostic plots of model fit for the optimal model. The raw BAG values were normalized to the range (0, 1) to allow for the application of different family distributions in the model.

Twenty-three multi-organ ageing clocks

In our previous work, we processed raw brain MRI data from the UKBB to extract 119 grey matter regions of interest (ROIs) from T1-weighted images, which were used to compute the brain MRIBAG. For the heart MRIBAG, we used 80 cardiac MRI-derived traits from a previous study57, which we had incorporated in an earlier study. Moreover, five abdominal MRIBAGs were derived from abdominal MRI data (category ID = 105) across multiple studies58,59,60,61,62,63, yielding a total of 7 MRIBAGs1. We also developed 11 ProtBAGs using UKBB plasma proteomics data2, along with 5 MetBAGs3 based on plasma metabolomics profiles. All 23 BAGs were developed using a nested cross-validation framework, adhering to best practices in machine learning to minimize overfitting and prevent data leakage2,64.

In our previous studies1,2,3, we described in detail the population definition and cross-validation procedures used for model training, which we summarize here. Applying a coherent machine learning framework, we assessed the performance of age-prediction models. Hyperparameter tuning was performed through nested, repeated holdout cross-validation with 50 repetitions (80% training/validation and 20% testing). Specifically, we performed a grid search for fine-tuning model-specific hyperparameters. The within-distribution, holdout test dataset was held out to unbiasedly evaluate model performance (different from the 20% test dataset from the cross-validation).

To rigorously train the age-prediction models, we first defined participants without any pathologies based on ICD code and clinical history as CN. We further split the CN into the following datasets:

  • CN within-distribution, holdout test dataset: 500 participants were randomly drawn from the CN population. Within-distribution, holdout test datasets are ideal for objectively evaluating machine learning model performance, especially in studies with large sample sizes, such as the UKBB.

  • CN training/validation dataset: 80% of the remaining CN population was used for the inner loop tenfold cross-validation for hyperparameter selection.

  • CN cross-validated test dataset: 20% of the remaining CN population was used for the outer-loop 50 repetitions.

  • Patient dataset: all patients who have at least one ICD-10-based diagnosis or clinical history.

The CN training/validation/test datasets were used for model development and were used with a nested cross-validation procedure for all machine learning models (LASSO regression and support vector regressor, elastic net and neural network), whereas the within-distribution, holdout test set provided an unbiased assessment of model performance. Model evaluation metrics included MAE and Pearson’s r. Age bias correction was applied using the approach outlined previously65.

IDP-wide associations

We assessed the association between sleep duration and 720 IDPs covering 8 organ systems and tissues using UKBB in vivo imaging data. For each IDP, we fitted the same GAM as in the ‘GAM models the relationship between sleep duration and organ ageing clocks, IDPs, proteins and metabolites’ section to capture the associations with sleep duration. These models included age, sex, body mass index, height, weight, waist circumference, assessment centre, blood pressure and disease status as covariates. Moreover, organ-specific covariates were incorporated, such as brain positioning in the scanner (lateral, transverse and longitudinal), head motion and intracranial volume for the brain IDPs. Outlier values in the IDP outcome variables, defined as ±4 s.d. from the mean, were removed to minimize the influence of extreme values. For each IDP, we extracted effect estimates and tested for the main effects of sleep, sex differences and sex-specific interactions. Predicted curves and 95% confidence intervals were generated separately for male and female individuals. When significant (P < 0.05/720), sex-specific turning points (sample minimum values of sleep duration) were also identified from the fitted curves. Descriptions of these 720 IDPs are provided in our previous study1,66 and in Supplementary Data 2.

Proteome-wide associations

Our analysis focused on the first instance of the proteomics data (“instance” = 0). We then integrated Olink files containing coding information, batch numbers, assay details and limit of detection (LOD) data (category ID: 1839) by matching them to the proteomics dataset ID. Finally, we excluded normalized protein expression values that fell below the protein-specific LOD. Descriptions of these proteins are provided in our previous study2 and in Supplementary Data 3.

We conducted ProWASs by linking sleep duration to 2,923 unique plasma proteins measured using the same GAM model. The GAM was adjusted for common covariates, including age, sex, weight, height, waist circumference, BMI, assessment centre, disease status, diastolic and systolic blood pressure, protein batch number, LOD and the first 40 genetic principal components. Multiple-testing corrections were applied using Bonferroni adjustment (P < 0.05/2,923). Given the substantial correlation structure among proteomic measures, we acknowledge this choice is conservative; we retain Bonferroni correction in the main text to minimize false positives. To identify and exclude extreme outliers, we defined an upper threshold as the mean plus 4 s.d. for each protein. We mainly focus on the 342 organ-enriched proteins defined in our previous study2.

Metabolome-wide associations

The original data (category ID: 220) were (1) calibrated absolute concentrations (or ratios) and not raw NMR spectra; and (2) before release, had already been subject to quality control procedures by Nightingale Health67. After the quality-check procedures described previously68, we performed additional quality-check steps to remove a range of unwanted technical variations, including shipping batch, 96-well plate, well position, aliquoting robot and aliquot tip. We focused our analysis on the first instance of the metabolomics data (“instance” = 0). The analysis included 327 metabolites (comprising both small molecules and lipoprotein measures), of which 107 were non-derived raw metabolites, and the remainder were composite metabolites, across 274,247 participants. Descriptions of these metabolites are provided in our previous study3 and in Supplementary Data 4.

We conducted MetWASs by linking sleep duration to 327 plasma metabolites. The GAM controlled common covariates, including age, sex, weight, height, waist circumference, BMI, assessment centre, disease status, diastolic and systolic blood pressure, and the first 40 genetic principal components. Multiple-testing corrections were applied using Bonferroni adjustment (P < 0.05/327). To identify and exclude extreme outliers, we defined an upper threshold as the mean plus 4 s.d. for each metabolite. We mainly focus on the 107 organ-associated metabolites defined in our previous study3.

Genetic analyses

We used the genotype and imputed genotype data from UKBB for all genetic analyses. Our quality-check pipeline focused on European ancestry in the UKBB (6,477,810 SNPs passing quality checks). We summarize our genetic quality check steps. First, we excluded related individuals (up to second degree) from the complete UKBB sample using the KING software for family relationship inference69. We then removed duplicated variants from all 22 autosomal chromosomes. Individuals whose genetically identified sex did not match their self-acknowledged sex were removed. Other exclusion criteria were (1) individuals with more than 3% of missing genotypes; (2) variants with minor allele frequency (MAF; dosage mode) of less than 1%; (3) variants with larger than 3% missing genotyping rate; (4) variants that failed the Hardy–Weinberg test at 1 × 10−10. To further adjust for population stratification70, we derived the first 40 genetic principal components using the FlashPCA software71. Details of the genetic quality check protocol have been described elsewhere9,10,72,73,74.

GWAS

Given the large sample sizes for both short versus normal (16,872 short and 300,420 normal) and long versus normal (25,049 long and 300,420 normal) sleep duration comparisons, we used REGENIE50 for GWAS, as it is well-suited for large-scale genetic analyses due to its computational efficiency and ability to control for population structure and relatedness, outperforming alternatives like PLINK and fastGWA in these settings. Our GWAS adjusted common covariates, including age, disease status, age-squared, sex, interactions of age with sex, BMI, waist circumference, standing height, weight, systolic/diastolic blood pressure and the first 40 genetic principal components. We applied a genome-wide significance threshold (5 × 10−8) to annotate the significant independent genomic loci.

Annotation of genomic loci

For all GWASs, genomic loci were annotated using FUMA75. For genomic loci annotation, FUMA initially identified lead SNPs (correlation r2 ≤ 0.1, distance <250 kb) and assigned them to non-overlapping genomic loci. The lead SNP with the lowest P value (that is, the top lead SNP) represented the genomic locus. Further details on the definitions of top lead SNP, lead SNP, independent significant SNP and candidate SNP can be found in Supplementary Note 2.

MAGMA tissue expression analysis

MAGMA gene-property analysis24 was conducted using gene expression data from the GTEx (v.8) to investigate tissue-specific associations with genetic variants. In contrast to differential gene expression enrichment tests focusing solely on prioritized genes, MAGMA leverages the full distribution of SNP P values across the genome, providing a more comprehensive assessment of how genetic signals relate to gene expression patterns in various tissues.

Genetic correlation

We estimated the genetic correlation (gc) using the LDSC23 software between the two abnormal sleep duration patterns and 527 DEs from the FinnGen and PGC datasets. We used precomputed LD scores from the 1000 Genomes of European ancestry, maintaining the default settings for other parameters in LDSC. Note that LDSC corrects for sample overlap, ensuring an unbiased genetic correlation estimate76. Statistical significance was determined using Bonferroni correction.

Survival analyses for risk of DEs and all-cause mortality

In the UKBB, we evaluated the predictive value of abnormal sleep duration patterns, specifically short sleep (<6 h) and long sleep (>8 h), through two sets of analyses (1) survival analysis to predict the future onset of incident single DEs defined by ICD-10 codes; and (2) survival analysis to estimate the longitudinal risk of all-cause mortality.

Survival analysis for ICD-based single DE

We used a Cox proportional hazard model while adjusting for covariates to test the associations of short/long sleep duration, compared to normal sleep duration (6–8 h), with the time to incident of ICD-based single disease entities. Notably, we excluded individuals with any disease diagnosis (except the disease of interest) to ensure the analysis was restricted to a disease-free population. The covariates age, sex, body mass index, height, weight, waist circumference and blood pressure were included as additional right-side variables in the model. To train the model, the time variable was determined by calculating the difference between the date of diagnosis of the disease for cases (or the censoring date for non-cases) and the date attending the assessment centre (field ID: 53). The participants who were diagnosed for a specific disease of interest after enrolling in the study were classified as cases; non-cases were defined by participants without any disease diagnoses.

Survival analysis for mortality risk

We used a Cox proportional hazard model while adjusting for covariates to test the associations of short/long sleep duration patterns with all-cause mortality. The covariates age, sex, body mass index, height, weight, waist circumference and blood pressure were included as additional right-side variables in the model. The hazard ratio, exp(βR), was calculated and reported as the effect size measure that indicates the influence of each biomarker on the risk of mortality. To train the model, the time variable was determined by calculating the difference between the date of death (field ID: 40000) for cases (or the censoring date for non-cases) and the date attending the assessment centre (field ID: 53). Participants who passed away after enrolling in the study were classified as cases.

SEM for mediation analysis

Using UKBB data, we used structural equation modelling (SEM)66 to examine whether organ-specific MRIBAGs, measured at the second imaging visit (2014 or later), serve as mediators in the relationship between sleep duration, assessed at the baseline visit (2007–2010), and two distinct subtypes of LLD (LLD1 and LLD2), also evaluated at the second visit. Sleep duration was categorized into binary groups reflecting short (<6 h) and long sleep (>8 h) patterns, with respect to the normal sleep duration (6–8 h).

For each MRIBAG, we specified a mediation model (sleep duration → MRIBAG → LLD) that included (1) a direct path from sleep duration to LLD subtype (c2); and (2) an indirect path from sleep duration to MRIBAG (a1), and from MRIBAG to LLD subtype (c1), with the product term (a1 × c1) representing the mediated (indirect) effect. Models were adjusted for relevant covariates, including age at assessment, sex, weight, standing height, waist circumference, BMI, diastolic blood pressure and systolic blood pressure. To assess the robustness of mediational directionality considering the time-ordering of events, we also tested reversed models (sleep duration → LLD1/2 → brain MRIBAG) to scrutinize potential inverse mediation. Significance was determined using a Bonferroni-corrected threshold (P < 0.05/7) to account for multiple comparisons across the 7 organ systems. All model estimates, including direct, indirect, total effects and proportion mediated, are reported in the Supplementary Information.

MR tests whether DEs are causally linked to short or long sleep duration relative to normal sleep duration

We conducted two-sample MR linking 525 DEs (as exposures) from FinnGen and PGC to short and long sleep duration as outcomes; however, limited statistical power prevented testing the reverse direction (sleep duration → DEs), which was partially tested by the mediation analysis (not in a strict sense of causal inference for SEM).

We used a two-sample MR approach implemented in the TwoSampleMR package77 to infer the causal relationships. We used five distinct MR methods, including the IVW method, Egger, weighted median, simple mode and weighted mode estimators. The STROBE-MR Statement78 guided our analyses to increase transparency and reproducibility, encompassing the selection of exposure and outcome variables, reporting statistics and implementing sensitivity checks to identify potential violations of underlying assumptions. First, we performed an unbiased quality check on the GWAS summary statistics. Notably, the absence of population overlapping bias79 was confirmed, given that FinnGen and UKBB participants largely represent populations of European ancestry without explicit overlap with UKBB. PGC GWAS summary data were ensured to exclude UKBB participants. Furthermore, all consortium GWAS summary statistics were based on or lifted to GRCh37. Subsequently, we selected the effective exposure variables by assessing the statistical power of the exposure GWAS summary statistics in terms of instrumental variables (IVs), ensuring that the number of IVs exceeded 7. Crucially, the function clump_data was applied to the exposure GWAS data, considering LD. The function harmonise_data was then used to harmonize the GWAS summary statistics of the exposure and outcome variables. Bonferroni correction was applied to all tested traits based on the number of effective DEs.

Finally, we conducted multiple sensitivity analyses. First, we conducted a heterogeneity test to scrutinize potential violations of the IV’s assumptions. To assess horizontal pleiotropy, which indicates the IV’s exclusivity assumption80, we used a funnel plot, single-SNP MR methods and the Egger estimator. Furthermore, we performed a leave-one-out analysis, systematically excluding one instrument (SNP/IV) at a time, to gauge the sensitivity of the results to individual SNPs. Notably, our MR analyses rely on the standard assumption of linear genetic effects on sleep duration, providing an average causal effect per unit increment in the exposure. As such, we conducted the sleep GWAS using binary traits (long sleep versus normal sleep and short sleep versus normal sleep), rather than treating sleep duration as a continuous variable. Nevertheless, the MR estimates do not directly represent the nonlinear U-shaped sleep–BAG relationships observed in our phenotypic analyses and should be interpreted as complementary to, rather than a direct mirror of, the observational sleep–BAG associations.

Ethics statement

All data used in this study were obtained from previously approved research cohorts and biobanks. The MULTI Consortium has been approved by the Institutional Review Board at Columbia University (IRB protocol: AAAV6751). Individual contributing studies received approval from their respective institutional review boards. All research was performed in accordance with relevant guidelines and regulations, and written informed consent was obtained from all participants in each study.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.