MethSurv : A web tool to perform multivariable survival analysis using DNA methylation data

Select a cancer type to explore
Query a gene
Choose island region from the list
Choose gene region from the list
Choose CpG site from the list
Choose splitting option to dichotomize methylation profiles of patients: mean, median, upper and lower quantiles (q25 and q75), maxstat and best among them
Would you like to adjust for co-variate(s)?
Add one or more covariates from the list
Select a cancer type
Select a cancer type to draw heatmap
Select a cancer type to explore top CpG biomarkers

About

MethSurv is an interactive and user friendly web portal providing univariable and multivariable survival analysis based on DNA methylation biomarkers using TCGA (The Cancer Genome Atlas) data. In order to use this tool, one needs no prior knowledge about programming and requires no installation of additional software but just needs a reasonable webrowser such as Safari, Chrome, firefox, Opera etc., The processed methylation data used in this study was downloaded from firebrowse. DNA methylation values were represented as beta values (ranging from 0 to 1). The beta value for every CpG site is calculated as M/ (M + U + 100). Here M and U are methylated and unmethylated intensities, respectively. We used several R packages including survival, Pheatmap, plyr , survminer and integrated with ClustVis for advanced clustering options. The web portal is developed using R shiny.

DNA methylation and its relation to cancer survival

DNA methylation is an epigenetic process where a methyl group is added to the cytosine base of the DNA most commonly occurring at the CG dinucleotides also known as CpG methylation. DNA methylation plays a vital role in cellular growth, differentiation and disease pathogenesis. Aberrant DNA methylation often considered as one of the characteristic signature of cancer development. Several studies have highlighted DNA methylation markers associated with differential cancer survival.

Data content

Here we present a dedicated web tool for survival anlysis based on DNA methyltion data containing 25 different cancer types and 7,358 patients downloaded from TCGA. Table 1 shows the number of patients for each cancer, covariates used and number of deaths in MethSurv.

Cancer Type Number of patients Number of deaths Median survival (days) Covariates
Acute Myeloid Leukemia [LAML] 194 129 365.0 Age, sex
Adrenocortical carcinoma [ACC] 80 29 1182.5 Age, sex, stage
Bladder Urothelial Carcinoma [BLCA] 412 182 535.0 Age, BMI, sex, grade
Brain Lower Grade Glioma [LGG] 515 126 674.0 Age, BMI, sex
Breast invasive carcinoma [BRCA] 782 104 843.0 Age, her2, er, pr, stage
Cervical squamous cell carcinoma and endocervical adenocarcinoma [CESC] 307 72 640.0 Age, BMI
Colon adenocarcinoma [COAD] 293 70 676.0 Age, BMI, sex
Esophageal carcinoma [ESCA] 185 77 400.0 Age, BMI, sex, stage, grade
Glioblastoma multiforme [GBM] 139 94 279.0 Age, sex
Head and Neck squamous cell carcinoma [HNSC] 527 224 645.0 Age, sex, stage
Kidney Chromophobe [KICH] 66 10 2248.0 Age, sex
Kidney renal clear cell carcinoma [KIRC] 319 105 1075.0 Age, sex, stage
Kidney renal papillary cell carcinoma [KIRP] 275 40 742.0 Age, BMI, sex, stage, grade
Liver hepatocellular carcinoma [LIHC] 377 132 601.0 Age, BMI, sex, stage
Lung adenocarcinoma [LUAD] 461 165 626.0 Age, sex, stage
Lung squamous cell carcinoma [LUSC] 372 161 660.0 Age, sex, stage
Mesothelioma [MESO] 87 74 513.0 Age, sex, stage
Pancreatic adenocarcinoma [PAAD] 184 99 467.0 Age, sex, stage, grade
Rectum adenocarcinoma [READ] 97 18 744.5 Age, BMI, sex
Sarcoma [SARC] 261 99 947.0 Age, sex
Skin Cutaneous Melanoma [SKCM] 462 222 1116.5 *Age, sex
Stomach adenocarcinoma [STAD] 395 155 449.0 Age, sex, stage, grade
Uterine Carcinosarcoma [UCS] 57 35 611.0 Age, BMI, stage
Uterine Corpus Endometrial Carcinoma [UCEC] 431 73 788.0 Age, BMI, stage
Uveal Melanoma [UVM] 80 23 784.0 Age, BMI, sex, pathologic_N

Table 1: Data description of MethSurv

*Height and weight data was insufficient and hence BMI was excluded from the multivariable model. Abbreviations: BMI= Body Mass index; her2= human epidermal growth factor; er= estrogen receptor; pr= progesterone receptor; pathologic_N= pathologically-determined absence or presence or extent of regional lymph node (pN) metastasis as defined by the American Joint Committee on Cancer (AJCC)

TCGA contains large portion of methylation data profiled using Infinium HumanMethylation450 BeadChip (HM450K). Therefore, our current version of the tool solely relies on methylation data profiled using HM450K.

This array features 450k CpG sites, covering gene sub regions including TSS200 (200 bp upstream from TSS), TSS1500 (1,500 bp upstream from transcription start site – TSS), 1st exon, 5’UTR, Body, 3’UTR. It also covers CpG island regions such as S_Shore and N_Shore (up to 2 kb up- and downstream of the CGI), S_Shelf and N_Shelf (2-4 kb up- and downstream of the CGI), OpenSea (remaning regions).

If you use MethSurv in your research, please cite our paper : V. Modhukur, T. Iljasenko, K. Lokk, T. Metsalu, T. Laisk-Podar, and J. Vilo. MethSurv: A web tool to perform multivariable survival analysis using DNA methylation data. 2017. Epigenomics. DOI: 10.2217/epi-2017-0118 [PMID: 29264942]

Quick start

1. Survival plots tab

Do you have a particular gene of interest and region of interest (gene sub-region and CpG island location) and want to browse through the survival association with DNA methylation levels? Click on "Survival plots tab." After choosing the genomic region of interest, you may choose which method to dichotomize methylation levels of the patients for survival analysis. The methods include Mean, Median, upper and lower quartile( q25 and q75) respectively and "optimal_by_log rank" (which uses MAXSTAT- Maximally Selected Rank Statistcs ) method. The "best cut-off point" runs all the options and outputs the best cut-off point (among Mean, Median, upper and lower quartile, MAXSTAT according to the maximum hazard ratio). Next, one can choose whether to perform univariable analysis (by not choosing any covariate) or multivariable analysis (add age, gender, etc. in the prediction model from the data). After selection, you will get survival plots, density plot showing the distribution of patient methylation levels and numerous cutoff points for dichotomizing patient methylation levels and survival analysis summary. It is also possible to have visual descriptive analysis in the form of violin plots, showing how methylation levels vary with the patient characteristics (age, gender, stage, etc.) for the chosen CpG. Additionally, you can download the plots in png/pdf and also export methylation data matrix for the entire gene. Link to several external browsers is provided to get additional details of the gene. Please note that you can generate the results only if all the options are selected (gene, island, region and CpG site)

2. All cancers tab

Do you want to view the survival analysis results for all the CpGs in a gene for all the cancers at once? Go to "All cancers tab." By clicking "Click for KM plot" for the chosen Gene-CpG-Cancer, you may get the survival plot interactively. You can also download the resulting table in a tab delimited format.

3. Top biomarkers tab

Do you want to view what are the top survival biomarkers for any cancer type? Click on Top markers tab. Here we display top 100 highly methylated and lowly methylated biomarkers sorted by p values and hazard ratios. you may also view KM plot for the CpGs from top biomarkers by clicking "Click for KM plot". You can also download the resulting table in a tab delimited format.

4. Gene visualization tab

Do you want to know how the methylation levels are distributed across the entire gene for the chosen cancer concerning patient characteristics such as age, BMI, vital status (event), race and ethnicity of the data, go to clustering tab? Interested in advanced clustering options, click on Clustvis button to provide advanced clustering visualization using ClustVis tool.You can also download the resulting image in pdf/SVG format.

Tutorial

 

1. Perform survival analysis of individual CpG

To perform survival analysis based on single CpG methylation, choose cancer type listed in the drop-down box and search the “gene symbol” that you are interested. Further choose location relative to CpG island, gene sub-region and CpG from the available options. Next, select the method to dichotomize methylation levels (Mean, Median, upper and lower quartile or the best among them!). Next, you may choose to whether to adjust for the covariates (age, gender etc.).

NB: Please ensure all the parameters are selected (Cancer type, gene symbol, CpG island location, gene sub-region, CpG id and split method).

In the output page, firstly Kaplan-Meir plots are displayed containing information related to the query gene, hazard ratio, and Log-like hood ratio (LR) test P-value. Next, a density plot is provided to visualize the methylation distribution and different splitting results. The number in red in the density plot, denote the threshold resulting from the currently chosen splitting method.

Below the density plot, we display a violin plot to visualize the methylation levels concerning patient characteristics such as age, gender, etc.

Below the violin plots, the Survival analysis statistical summary is displayed and options to export the analysis results are also provided. Additionally, link to external browsers enables to browse additional gene details in  genecards, COSMIC and GO

2. Retrieve survival analysis details for one or more cancers for all the CPGs in the proximity of a query gene

A user may query a gene of interest and retrieve the resulting table containing survival summary of all the CpGs within a query gene using “All cancers” tab. One may also search for cancer, CpG or genomic region of interest in the search box as shown. Further, one may also retrieve survival plot by clicking “Click for KM plot” for any of the rows from the table. An example plot is shown below.

3. Retrieve top CpG biomarkers for any cancer

A user may query a cancer type and retrieve the resulting table containing survival summary of top CpG biomarkers for the selected cancer type. Further, one may also retrieve survival plot by clicking “Click for KM plot” for any of the rows from the displayed table.

4. Clustering analysis of a query gene

You may query a gene and perform clustering analysis of individual CpGs within a query gene in the form of a heat map for any of the chosen “cancer” and “gene”. Seamless integration with ClustVis enables users to perform advanced clustering analysis.

 

Further, one can choose different methods for computing principal components by using ClustVis as shown below.

Similarly, one can explore additional parameters to plot heat map using ClustVis as shown below. 

Data download page

Genome-wide survival analysis results (in tab-delimited format) 

Acute Myeloid Leukemia [LAML] March 2017
Adrenocortical carcinoma [ACC] March 2017
Bladder Urothelial Carcinoma [BLCA] March 2017
Breast invasive carcinoma [BRCA] March 2017
Brain Lower Grade Glioma [LGG] March 2017
Cervical squamous cell carcinoma and endocervical adenocarcinoma [CESC] March 2017
Colon adenocarcinoma [COAD] March 2017
Esophageal carcinoma [ESCA] March 2017
Glioblastoma multiforme [GBM] March 2017
Head and Neck squamous cell carcinoma [HNSC] March 2017
Kidney Chromophobe [KICH] March 2017
Kidney renal clear cell carcinoma [KIRC] March 2017
Kidney renal papillary cell carcinoma [KIRP] March 2017
Liver hepatocellular carcinoma [LIHC] March 2017
Lung adenocarcinoma [LUAD] March 2017
Lung squamous cell carcinoma [LUSC] March 2017
Mesothelioma [MESO] March 2017
Pancreatic adenocarcinoma [PAAD] March 2017
Rectum adenocarcinoma [READ] March 2017
Sarcoma [SARC] March 2017
Skin Cutaneous Melanoma [SKCM] March 2017
Stomach adenocarcinoma [STAD] March 2017
Uterine Carcinosarcoma [UCS] March 2017
Uterine Corpus Endometrial Carcinoma [UCEC] March 2017
Uveal Melanoma [UVM] March 2017

Methyation matrix (in Rdata format) 

Methylation files are provided in "RData"  format for reanalysis.  We kindly ask the users to read TCGA guideines for using this data.  The methylation data files are large and therefore can take some time for the download, depending on the internet connection speed.

Acute Myeloid Leukemia [LAML] March 2017
Adrenocortical carcinoma [ACC] March 2017
Bladder Urothelial Carcinoma [BLCA] March 2017
Breast invasive carcinoma [BRCA] March 2017
Brain Lower Grade Glioma [LGG] March 2017
Cervical squamous cell carcinoma and endocervical adenocarcinoma [CESC] March 2017
Colon adenocarcinoma [COAD] March 2017
Esophageal carcinoma [ESCA] March 2017
Glioblastoma multiforme [GBM] March 2017
Head and Neck squamous cell carcinoma [HNSC] March 2017
Kidney Chromophobe [KICH] March 2017
Kidney renal clear cell carcinoma [KIRC] March 2017
Kidney renal papillary cell carcinoma [KIRP] March 2017
Liver hepatocellular carcinoma [LIHC] March 2017
Lung adenocarcinoma [LUAD] March 2017
Lung squamous cell carcinoma [LUSC] March 2017
Mesothelioma [MESO] March 2017
Pancreatic adenocarcinoma [PAAD] March 2017
Rectum adenocarcinoma [READ] March 2017
Sarcoma [SARC] March 2017
Skin Cutaneous Melanoma [SKCM] March 2017
Stomach adenocarcinoma [STAD] March 2017
Uterine Carcinosarcoma [UCS] March 2017
Uterine Corpus Endometrial Carcinoma [UCEC] March 2017
Uveal Melanoma [UVM] March 2017

FAQ

1. Is this tool free to use and requires registration?
The tool is open source and free to all the users and requires no registration.

2. How is the performance of this tool?
The web browser currently runs in the high performance computing cluster. Sometimes if the cluster is intensively used, this can affect the speed. It is expected to perform well on regular browsers such as Chrome, Safari, Firefox, Opera etc;

3. What does a semicolon between 2 genes mean?
In the HM450K array, the probes may span between one or more genes (overlapping genes). These are separated with a semi colon.

4. What are the components of MethSurv?
The main components of MethSurv are i) "MethSurv"- analysis page further divided into a) "Survival analysis" tab for analysing single CpG b) "All cancers" tab to browse through survival summary of all the CpGs of a query gene. c) "Top biomarkers" tab to browse through top biomarkers for every cancer and d) "gene visualization" tab to generate heatmap for a query gene and cancer. ii) About page providing a short over view of the purpose of the tool and the data contents. iii) "Quick start" page to have a quick tutorial. iv) "Tutorial" page on detailed usge of this software.v) Download page and Vi) FAQ page.

5. How do I cite this work?
If you use MethSurv in your research, please cite our paper : V. Modhukur, T. Iljasenko, K. Lokk, T. Metsalu, T. Laisk-Podar, and J. Vilo. MethSurv: A web tool to perform multivariable survival analysis using DNA methylation data. 2017. Epigenomics. DOI: 10.2217/epi-2017-0118 [PMID: 29264942]

6. Why is the FDR threshold in top predictor tab remains insignificant?
It is often possible that for some cancer, after the pvalue correction, a CpG can be considered insignificant. However, it may be more intuitive to focus on Hazard ratio

7. How to provide feedback?
The authors are eager to hear the feedback from the users and would be grateful if you use our results in your research. Please contact the app maintainer and developer at modhukur [@] ut .ee

8. How to get results in the survival plots tab without any problems?
Please ensure that all of the input parameters (TCGA cancer datasets, Gene, Relation to island, Genomic Region, Split by, CpG site) remains non-empty in "survival plots tab"!

9. How to download the results?
Please visit "Download" tab, for downloading the genome-wide survival analysis results and the methylation data matrix (in RData format) for every cancer.