Published epigenome-wide association studies are identified through periodic PubMed searches. The search terms are "epigenome-wide" OR "epigenome wide" OR "EWAS" OR "genome-wide AND methylation" OR "genome wide AND methylation". There are also restrictions placed on the year of publication (i.e. studies after 2010) and whether the study is in human samples.
Studies are eligible for inclusion in the EWAS Catalog if:
- Include at least 100,000 CpG sites in the analysis.
- Have a sample size of at least 100 individuals.
Studies were not included in the EWAS Catalog if:
- The DNA methylation data studied was not genome-wide.
- The study does not include any new EWAS data.
Association results with CpGs are eligible for inclusion in the EWAS Catalog if:
- The association has p < 1×10-4.
- The analysis was performed genome-wide.
The data for the EWAS Catalog are manually extracted from the published literature. The information extracted falls into four categories: study information, analysis information, participants information, and CpG results.
The following pieces of study and publication information are extracted:
- Author - the first author of the publication (surname then initials).
- Consortium - the name of the consortium.
- PMID - the PubMedID of the publication.
- Date - the date the paper was published (YY-MM-DD).
- Trait - the name of the trait.
- EFO - the corresponding ontology term(s) for the trait.
- Analysis - description of the analysis performed.
- Source - the table where the result can be found in the paper.
The following pieces of information on the analysis are extracted:
- Outcome - the outcome of the analysis.
- Exposure - the exposure of the analysis.
- Covariates - the covariates adjusted for in the analysis.
- Outcome_Units - the units of the outcome.
- Exposure_Units - the units of the exposure.
- Methylation_Array - the array used to measure the methylation.
- Tissue - the tissue in which the methylation was measured.
- Further_Details - any other relevant details of the analysis.
The following pieces of information on the participants are extracted:
- N - the total number of participants used in the analysis.
- N_Cohorts - the total number of cohorts used in the analysis.
- Categories - the total number of individuals in each category (e.g. 200 smokers, 150 never smokers).
- Age - the mean age of the participants in years.
- N_Males - the total number of males used in the analysis.
- N_Females - the total number of females used in the analysis.
- N_EUR - the total number of European participants.
- N_EAS - the total number of East Asian participants.
- N_SAS - the total number of South Asian participants.
- N_AFR - the total number of African participants.
- N_AMR - the total number of Admixed American participants.
- N_OTH - the total number of participants of non-EUR, non-EAS, non-SAS, non-AFR, non-AMR ancestry.
The following information on the CpG associations are extracted:
- CpG - the CpG site.
- Beta - the effect estimate.
- SE - the standard error of beta.
- P - p-value.
- Details - any additional details on the analysis (e.g. sub-trait).
The EWAS Catalog has four query options:
- CpG - this presents all results with the CpG site. A CpG ID or a hg19 chromosome:position can be queried.
- Gene - this presents all CpG results within the gene.
- Region - this presents all CpG results within the region.
- Trait - this presents all the CpG results with a trait. This query uses ZOOMA and selects results based on their EFO terms.
See example queries under the search box.
On the screen a table of results is presented with a subset of columns (or variables) to browse. The full dataset with all variables for the query is available to download. This file is a tab-deliminated tsv file with the same variables as in the downloadable catalog.