Study design

The studies that we analyzed included: five retrospective case control studies [14, 16, 21, 24, 25], two cross-sectional studies [18, 19], and two prospective cohort studies [17, 20]. Each of these designs allowed to stratify patients according to the severity of symptoms and to perform differential methylation analyses on diagnostic samples. The number of patients in the studies largely differed from n = 731 and n = 473 patients in [19, 25], respectively to n = 14 [18] (Table 2).

Table 2 Patient characteristics in the included studies

Patient characteristics in analyzed studies

Age of the patient is one of the strongest risk factors associated with the severity of the infection outcome [4]. Also, methylation patterns of blood cells change with age [27]. Thus, it is plausible that age of the patients will influence methylome and needs to be considered in analysis aiming to identify methylation changes associated with the severity of the infection. Age of the patients in the studies that we considered in our analysis significantly differed. Specifically, in the “youngest” cohort included in the study, the median age was 42 and ranged between 19 and 61 [14]. In the “oldest” cohort, the mean age of participants in discovery cohort was 67 and 76 in mild and severe cases, respectively. The participants in the replication cohort included in this study had an average age of 61 in mild group and of 61 in severe group [19] (Table 2).

Also, different ethnic groups have been shown to display ethnicity-related methylation signatures in peripheral blood [28]. Moreover, specific ethnic groups have been reported to display different risks of acquiring SARS-CoV-2, as well as severity of clinical outcomes [29]. Four studies included in our analysis specified race or ethnic origin of the participants [14, 17, 20, 24]. Specifically, a study [17] included mild cases of the following ethnicity: White (38.5%), Black (8.8%), Asian (1.8%), Hispanic (28.1%) and other ethnicity groups (22.8%). Severe group in this study consisted of White (53.3%), Black (13.3%), Asian (2.2%) and Hispanic (11.1%) subjects or cases declaring other ethnicities (20.1%). In the second study [14], entire cohort consisted of subjects from West-Eurasia (72%) and contained also cases from Central-South America (discovery cohort: 18%, validation cohort: 33%). The third study [24] involved COVID-19 cases that were mostly Hispanic or Latino (53.7%) and with the following ancestry: American Indian/Alaskan Native (0.6%), Asian (7.3%), Black/African American (15.2%), Native Hawaiian/Other Pacific Islander (1.2%), White (28%) or other declared (42.7%), missing (4.9%). In fourth study [20], most of the participants originated from South Asia. Mild group consisted of cases from Middle East (2.6%), North Africa (1.3%), Northeast Africa (5.2%), South Asia (82.8%), Southeast Asia (6.5%) and Western Asia (1.3%), whereas in severe group participants declared ethnicity of Middle East (4.1%), Northeast Africa (8.2%), South Asia (75%), Southeast Asia (8.2%) and Western Asia (4.1%). The study that was based on reported and new patient cohort [25] and included patients from Poland, Spain and USA, also did not take into account the ethnicity of the participants in the analysis.

Overall, the majority of the cohorts in analyzed studies which reported ethnicity consisted of Caucasians or populations of mixed descent. Two studies [14, 20] involved patients predominantly of Asian and West-Eurasian descent (Table 2).

It has also been shown that morbidity and mortality from COVID-19 are higher in men than in women and, therefore, sex of the patients should be considered in the analysis of methylation changes induced by the virus [30, 31]. In general, studies that we found eligible for our analysis were not balanced in terms of gender, with males being over-represented among the patients (Table 2). For example, in the study [20] all subjects with severe COVID-19 were men and male participants comprised the majority of severe cases in the study [17].

Definition of severity of the infection

A number of indexes, such as WHO Clinical Progression Scale [32], COVID-19 Global Risk Assessment Model (COVID-GRAM) score [17] or Acute Respiratory Distress Syndrome (ARDS) Berlin severity index [33] have been developed with the aim to unify the assessment of severity of SARS-CoV-2 infection. However, none of those scales entered general clinical use.

The studies that we analyzed did not utilize unified assessment of the severity of the outcomes (Table 3). Specifically, in the study [19] subjects were stratified using WHO Clinical Progression Scale [32], where mild and severe cases had WHO score of 1 to 4 (ambulatory or hospitalized, mild symptoms) and 5 to 8 (hospitalized with severe symptoms or dead), respectively. The study [17] classified subjects according to the risk of severe disease outcome using dichotomized GRAM-score risk scale [17], with GRAM score under 50% and over 50% denoting lower and higher risk of severe COVID-19 course, respectively. The study [20] largely differed from other studies, because it only included cases with COVID-19 and acute respiratory distress syndrome under mechanical ventilation. In this study survivals of 60-day follow-up period were considered mild cases. The study [24] did not stratify subjects into mild and severe groups, but focused on the development of a methylation score predicting whether a patient was: discharged from the emergency department, admitted to the hospital, accepted to ICU or died.

Table 3 Definition of severity of COVID-19 in the included studies

The study [16] defined mild cases as those not requiring hospitalization and without neither fever nor dyspnea, and severe cases as those hospitalized because of SARS-CoV-2 manifesting: fever higher than 38 °C, dyspnea, cough and fatigue. In the study [14] mild group consisted of asymptomatic or paucisymptomatic subjects which were not hospitalized, whereas severe group included patients that were either hospitalized with oxygen therapy, such as nasal mask or prongs, or admitted to intensive care unit (ICU) with the need for mechanical ventilation, additional organ support or extra corporeal membrane oxygenation (ECMO). In the study [18] mild cases were those experiencing mild fatigue, low fever, discrete olfactory and taste disturbances (a according to the guideline on the management of COVID-19 published by the National Health Commission of the People’s Republic of China, version 8.0 [18]) and no visible signs of pneumonia in imaging, whereas severe cases presented with dyspnea, oxygen saturation lower than 93% or arterial blood oxygen partial pressure/oxygen concentration lower than or equal to 40 kPa. In the study [21] patients admitted to ICU or dead from the infection were classified as severe and those discharged after diagnosis were classified as cases with mild symptoms. In the study [25], mild cases were those not requiring hospitalization and severe group consisted of hospitalized subjects. The overview of the severity criteria in analyzed studies is shown in Table 3.

Technology used for genome-wide methylation screening

The MethylationEPIC BeadChip (Infinium) microarray is a technology that has become sufficiently cost-efficient to warrant of genome-wide methylation studies in clinical settings that in most of the cases are based on large patient cohorts. Also, it has been reported that to achieve a level of precision broadly comparable to the methylation array, a minimum coverage of 100x in NGS studies is recommended [34]. Adding to utility of this technology in studies based on large cohorts of subjects. The studies of SARS CoV-2-related methylation changes that we included in our analysis also utilized this technology. Most of the studies used standard chip design. One study utilized microarray with additional probes in loci within or spanning regions encoding important components of immunity, such as major histocompatibility complex (MHC) or known human leukocyte antigen (HLA) alleles [24].

Clinical material source for the analysis

Data showing that in some instances blood cells may be infected by the SARS-CoV-2 have been reported [35], but in general terms blood cells are not targeted by the virus. However, the antiviral defense is triggered by immune system response and identified in blood of COVID-19 patients methylation changes are a part of host methylome response to the infection. Thus, major target in the analysis aiming to develop biomarkers of the severity of the infection are cells of the immune system, methylomes of which respond to the infection. Eight of the studies that we included in our analysis were based on DNA extracted from diagnostic peripheral blood sample and one utilized peripheral blood mononuclear cells (PBMCs).

Clinical material processing for the methylation analysis

Despite that DNA extraction methods have been well standardized; the use of different DNA extraction methods needs to be considered when comparing the results of studies. Two studies included in our analysis did not specify the DNA extraction kit used [14, 21], two of the studies used QIAamp DNA Blood Mini Kit (QIAGEN Diagnostics GmbH, Hilden, Germany) [18, 19], one QIAsymphony DSP DNA Mini Kit (QIAGEN Diagnostics GmbH, Hilden, Germany) [16], one Allprep DNA/RNA mini kit (QIAGEN Diagnostics GmbH, Hilden, Germany) [20], one GeneJET whole blood kit (ThermoFisher Scientific) [17], one PureLink™ Genomic DNA Mini Kit for Polish original cohort in study [25], and one bead-based, automated extraction Maxwell(R) RSC System (Promega) [24]. According to manufacturer recommendations, minimum DNA input for EPIC microarray methylation profiling should be between 250 ng and 1000 ng, which in the beginning of the experiment is subjected to bisulfite modification. Although it is safe to assume that all of the analyzed studies followed the official recommendations, two of them specified input of DNA used for bisulfite modification, a necessary step in MethylationEPIC BeadChip (Infinium) microarray-based methylation screening. The amounts of the DNA used in those studies were in the recommended range: 500 ng in study [19] and 900 ng in study [21].

The quantity and quality of DNA used in methylation analysis is a major limitation of the analysis. Although manufacturer makes specific recommendations for the DNA quality and quantity assessment method to be used before hybridization to the microarray, five studies that we analyzed [14, 16, 18, 20, 25] did not report the DNA quality assessment method used. In one study DNA quality assessment was performed with ultraviolet (UV) visible (VIS) spectrophotometry (N60 Implen Nanophotometer) [21], one utilized Qubit [17] and two used both Qubit and spectrophotometry [19, 24].

Also, the microarray manufacturer specifically recommends one of the specific Zymo Research kits (EZ-96 DNA Methylation Kit, EZ DNA Methylation Kit, EZ-96 DNA Methylation-Lightning MagPrep Kit and EZ DNA Methylation-Lightning Automation Kit) for bisulfite conversion in EPIC microarray protocol. Four of the studies that we analyzed [16, 19, 21, 24] reported the use of the recommended kits, others did not specify the bisulfite modification method used.

Pipelines for methylation microarray data pre-processing

Before statistical methods can be used to identify methylation changes in the microarray-based studies, the microarray data need to be pre-processed to numeric values representing methylation levels at screened CpG sites. A number of the pre-processing bioinformatics pipelines has been developed for MethylationEPIC BeadChip (Infinium) microarray data and studies selected for our analysis utilized eight of them, specifically: minfi in [14, 16, 17, 19, 20], DMRcate in [16], meffil in [19], ChAMP in [14, 25], Illumina Genome Studio in [18], RnBeads in [21], maxprobes in [20] and SeSAME in [24]. Each of these pipelines may differ in type of the normalization method, implementation of cell fraction and batch effect correction, and the use of those elements of the data pre-processing can significantly influence the data analysis outcomes.

Type of normalization used

Due to the design, Illumina microarrays contain two types of probes and the methylation level measurements derived from those two types of probes need to be normalized before downstream analyses to avoid probe design bias [36]. Several normalization methods have been developed and, in the studies selected for our analysis, six different methods were used, including: functional normalization [16, 17, 19], ssNoob [14], background correction [18], BMIQ [21, 25], stratified quantile normalization (SQN) [20] and noob [24]. The lack of coherence of normalization techniques between compared studies might have influenced the results of the analyses and number of the methylation changes identified and thus we attempted to benchmark methods used at each step of data analysis.

Cell fraction correction

When analyzing a blood sample consisting of mixture of different cell types, proportions of which may change during viral infection, it is essential to consider sample composition in the data analysis [37]. Methods that allow to adjust methylation level readouts for cell fraction proportions in specific sample have become standard in the analysis of the methylation patterns in blood samples.

Correction of beta values, which are methylation level measurements scale in BeadChip technology, for cell fraction proportions during data preprocessing is considered the most accurate approach when accounting for blood cellular heterogeneity [38]. However, beta values were not directly adjusted during data preprocessing for calculated cell type proportions in any of the analyzed here studies except for study [25] where robust partial correlation method implemented in EpiDISH R package was used. However, four studies included inferred cell type proportions as covariates in statistical models used for the identification of the methylation changes. Specifically, in a study [19] cell type proportions were calculated using robust partial correlation method from EpiDISH R package. Then, the estimated cell type proportions were used to adjust linear regression model in differential methylation analysis between mild and severe patient groups. The study [16] used white blood cell counts estimated with Horvath’s DNAmAge calculator [39] as covariates in a linear regression model fitted to identify methylation differences between mild and severe cases. In the study [21] cellular composition was also calculated using DNAmAge calculator and estimated cell type proportions were used as covariates in a differential methylation analysis between patients displaying mild and severe symptoms which were based on linear model (limma). In the study [17] leukocyte proportions were inferred from methylation levels and included as covariates in the statistical linear regression model used in differential methylation analysis. However, in this study cell fraction estimation method was not specified. In the study [20] cell type proportions were inferred with FlowSorted.Blood.EPIC R package, however, these proportions were not included as covariates in differential methylation analysis. Three studies that we included in our analyses did not address sample heterogeneity in the analyses [14, 18, 24].

Batch effect correction

One other source of the interexperimental variation in the analysis of Methylation EPIC BeadChip microarray data is inert technical variation observed between experiments ran on different microarray batches called “batch effect” [37]. This limitation can also influence results of microarray data pre-processing and should be addressed in the data analysis pipeline. The ComBat algorithm was developed to address this limitation and can be used in experimental designs where batch effect is expected [37]. Only one of the studies that we analyzed implemented ComBat correction to minimize technical variation [17]. The sample randomization, which also effectively minimizes technical variation, was used in the study [24]. In the study [21] batch effect was assessed with PCA, however, authors did not specify if this covariate was used in the analysis. Authors of one study performed evaluation of the batch bias and did not recognize the need for batch effect correction [19]. Remaining studies did not consider batch effect in the data analysis [14, 16, 18, 20]. In the study [25] batch effect was assessed but there was no indication of its influence on data.

Statistics used for the identification of the methylation changes

The methylation changes can be reported as changes at specific CpG sites (Differentially Methylated Positions– DMPs) or changes over a genomic region with a number of adjacent CpG sites displaying similar methylation levels (Differentially Methylated Regions– DMR). Overall, each of the studies we analyzed here utilized different statistical approaches for the identification of infection-related methylation changes.

Specifically, in the study [16] where no methylation changes between mild (n = 48) and severe (n = 61) were identified, linear regression adjusted for sex and age from limma R package (Bonferroni-corrected p 

The study [17] utilized two-stage approach to identify methylation changes. First, 19 DMRs were identified using DMRcate R package, with DMR defined as region with more than 3 CpG sites displaying FDR-corrected p 

In the study [14] linear regression (limma R package) adjusted for sex and age was used to identify 44 CpG sites with statistically significant (Benjamini-Hochberg-corrected p 

The study [18], reported 245 methylation changes between severe (n = 7) and mild (n = 7) cases that were identified with Illumina Genome Studio software. Only methylation changes larger than absolute mean beta value of 0.17 and absolute diffscore larger than 13 (corresponding to p 18].

In the study that included largest cohort of patients n = 473 COVID-19 [19], two separate analyses were performed to identify severity-associated methylation changes. In the first analysis 20 CpGs were identified using linear regression adjusted for age and gender, between mild and severe patients, defined according to WHO clinical ordinal scale with mild cases being: PCR positive individuals (ambulatory or hospitalized with mild symptoms, 1–4 scales), and severe PCR positive individuals (hospitalized with severe symptoms or died, 5–8 scales). In the second analysis the WHO-based severity score was considered as continuous variable in linear regression model adjusted with age and sex and this approach identified 257 DMPs. In both analyses methylation changes were considered significant when displayed nominal p-value lower than 0.01 and when meta-analysis p-value (computed with restricted maximum likelihood and fixed effects methods implemented in metafor R package) was lower than 5e-08.

The study [21] reported 21 methylation changes between 123 mild and 64 severe COVID-19 patients using multi-step procedure. First, that procedure identified 880 DMPs with linear regression (R, limma, age- and sex-adjusted, p p = 3.4 × 10− 5, hypergeometric test). Then, five clusters were compared to each other to select DMPs specific to severe subjects (exact specification how the analysis was performed was not reported) and 21 DMPs was selected to constitute final signature.

The study [20] also utilized two procedures to identify severity-related methylation changes. Here, authors reported 49 methylation changes identified with spline regression (splineDiffExprs function from splineTimeR package) which analyzed combined methylomics data from four timepoints (T1 – admission to the ICU, T2 – day 7– T2, T3 – day 14, T4 – day 21) between patents who either recovered (n = 24) or died (n = 12) from infection at day 60. As well as 13 DMPs predicting outcome-related signature identified using univariate Cox proportional hazard model with Z-score transformed methylation data and hospital stay as a time variable in RegParallel R package. The study [24] which included methylomics data from 460 subjects (SARS-CoV-2 positive: n = 164, SARS-CoV-2 negative: n = 296) utilized an algorithm that did not aim to find DNA methylation signatures, but perform subject classification according to the symptoms and outcomes. Specifically, in this study elastic net penalized regressions (glmnet, R) were used with multiple hyperparameter sets to all autosomal CpG probes that passed QC with the aim to obtain a methylation score predictive of patient outcome measured by self-constructed ordinal severity scale that included: dischargement from emergency department, admission to hospital, progression to ICU and death. Lastly, in the study [25] methylation changes associated with hospitalization were identified using logit regression model adjusted for age and sex.

The number and the magnitude of the identified methylation changes

The largest number of CpG sites associated with infection-induced methylation changes identified in the studies that we analyzed was 257 [19]. Overall, the studies reported from eight [20] to 77 differentially methylated CpG sites [17]. One study did not report any methylation changes associated with COVID-19 symptoms severity [16] (Table 4).

Table 4 Methylation changes identified in the included studies

Two of the analyzed studies did not report the magnitude of methylation changes between cases and controls. First of those studies [24] aimed to perform classification of the subjects according to the symptoms and outcome and, thus, did not aim to find specific severity-related signature. The second [20] classified CpGs as differentially methylated only with the use of p-value cutoff without considering the magnitude of the methylation changes. In the study, that reported the largest number of methylation changes, the level of methylation difference at the identified CpG sites was more than 17% points [18]. Two of the studies [14] and [25] focused only on methylation changes larger than 5% points. The methylation level differences reported in remaining studies were less than 5% points (Table 4 and in electronic Additional File 2).

Reproducibility of COVID-19 severity related methylation changes between independent studies

The major requirement for a robust and clinically applicable biomarker is to be reproducibly detectable in independent cohorts of patients. Unfortunately, only one study included in our benchmarking analysis [25] that at the same time reanalyzed data from previous publications, reported 2 methylation changes that were also found in previous study [14] and 11 changes that were reported in study [19]. This study detected a subset of methylation changes across independent cohorts of patients from Poland [25], Spain [14] and two locations in USA [17, 24], but utilized uniform data analysis methodology that was different from the methods used in original publication of data reanalyzed in this study. This clearly indicates that coherent data analysis workflow is essential in analysis of SARS-CoV-2-related and most likely also other disease-related methylation changes. The details of comparison COVID-19 severity-related methylation changes between analyzed studies are provided in Additional File 3 (Table 1).