Study identification

 

Published epigenome-wide association studies are identified through periodic PubMed searches using the journalclub R package. The search terms are "epigenome-wide" OR "epigenome wide" OR "EWAS" OR "genome-wide AND methylation" OR "genome wide AND methylation". There are also restrictions placed on the year of publication (i.e. studies after 2010) and whether the study is in human samples.

 

Study eligibility

Study inclusion

Studies are eligible for inclusion in the EWAS Catalog if:

  • Include at least 100,000 CpG sites in the analysis.
  • Have a sample size of at least 100 individuals.

 

Study exclusion

Studies were not included in the EWAS Catalog if:

  • The DNA methylation data studied was not genome-wide.
  • The study does not include any new EWAS data.

 

CpG inclusion

Association results with CpGs are eligible for inclusion in the EWAS Catalog if:

  • The association has p < 1×10-4.
  • The analysis was performed genome-wide.

 

Data extraction

The data for the EWAS Catalog are manually extracted from the published literature. The information extracted falls into four categories: study information, analysis information, participants information, and CpG results.

 

Study information

The following pieces of study and publication information are extracted:

  • Author - the first author of the publication (surname then initials).
  • Consortium - the name of the consortium.
  • PMID - the PubMedID of the publication.
  • Date - the date the paper was published (YY-MM-DD).
  • Trait - the name of the trait.
  • EFO - the corresponding ontology term(s) for the trait.
  • Analysis - description of the analysis performed.
  • Source - the table where the result can be found in the paper.

 

Analysis information

The following pieces of information on the analysis are extracted:

  • Outcome - the outcome of the analysis.
  • Exposure - the exposure of the analysis.
  • Covariates - the covariates adjusted for in the analysis.
  • Outcome_Units - the units of the outcome.
  • Exposure_Units - the units of the exposure.
  • Methylation_Array - the array used to measure the methylation.
  • Tissue - the tissue in which the methylation was measured.
  • Further_Details - any other relevant details of the analysis.

 

Participants information

The following pieces of information on the participants are extracted:

  • N - the total number of participants used in the analysis.
  • N_Cohorts - the total number of cohorts used in the analysis.
  • Categories - the total number of individuals in each category (e.g. 200 smokers, 150 never smokers).
  • Age - the age group participants belonged to.
  • Sex - sex of individuals used in the analysis.
  • Ancestry - ancestry of the individuals used in the analysis.

 

CpG results

The following information on the CpG associations are extracted:

  • CpG - the CpG site.
  • Beta - the effect estimate.
  • SE - the standard error of beta.
  • P - p-value.
  • Details - any additional details on the analysis (e.g. sub-trait).

 

Query options

The EWAS Catalog can be queried using a single term using main search bar or by a combination of two terms using the advanced search bar. After entering a query in the main search bar, a page will appear that enables further specification of the search term. There are seven query options:

  • CpG - this presents all results with the CpG site. A CpG ID or a hg19 chromosome:position can be queried.
  • Gene - this presents all CpG results within the gene.
  • Region - this presents all CpG results within the region.
  • Trait - this presents all the CpG results with a trait. This query uses ZOOMA and selects results based on their EFO terms.
  • EFO term - an EFO term can be queried directly as an alternative to querying a trait by name
  • Author name - this refers to first authors and presents all results from papers by that author
  • PMID - this presents all results from a specific paper with the queried PubMed ID

See example queries under the search box.

 

Output

On the screen a table of results is presented with a subset of columns (or variables) to browse. The full dataset with all variables for the query is available to download. This file is a tab-deliminated tsv file with the same variables as in the downloadable catalog.