Tutorial: How to create a mutation landscape (waterfall) plot with GenVisR
gravatar for Obi Griffith
2.4 years ago by
Obi Griffith16k
Washington University, St Louis, USA
Obi Griffith16k wrote:

This tutorial makes use of the GenVisR package. Please cite:

Skidmore ZL, Wagner AH, Lesurf R, Campbell KM, Kunisaki J, Griffith OL, Griffith M. 2016. GenVisR: Genomic Visualizations in R. Bioinformatics. pii: btw325. [Epub ahead of print]
PubMed | Bioinformatics Journal | BioRxiv | Bioconductor | GitHub

Note: A more comprehensive version of this tutorial is also available as part of our Genomic Visualization and Interpretation course at www.genviz.org.

Upon sequencing a set of tumor samples, from small to large cohorts, a common first step is to analyze/visualize the recurrence and co-occurrence of gene-level mutations. If you look in almost any recent TCGA or ICGC paper you will find plots similar to the example below where a matrix of genes by samples is shown with mutation type indicated by coloring of each matrix cell. It is also often common to add sidebars for relevant clinical details, gene mutation rate, and/or sample mutation burden. There are some web-based tools that now allow you to create such views for pre-loaded data. For example, the cBioPortal provides such visualizations for much of their pre-loaded TCGA and ICGC data and their excellent OncoPrinter tool also allows custom data input. However, in many cases, producing publication-ready waterfall plots requires further customization. Such custom plots have historically been created through ad hoc R plotting or even manually in Excel (yikes!). In other cases, automated generation of plots for multiple sets of genes (e.g., pathway by pathway views) is desired. To address the needs for automation, customization and accessibility we have created the GenVisR package for Genomic Visualizations in R. The waterfall function is just one of many convenient functions for the production of highly customizable publication quality graphics for genomic data primarily at the cohort level.

In this tutorial we will demonstrate the use of the GenVisR waterfall function. We will recreate such a plot as that shown below and recently published in Ma et al (2015).

waterfall plot from Ma et al (2015)

The first required step is to install GenVisR. First, make sure that you have the latest version of R (3.3.0 or later) available from CRAN and launch an R session. GenVisR is available through BioConductor and can be installed by the usual method. At an R prompt, we will install GenVisR and load the GenVisR library as follows:


Now, lets get the mutation data for Ma et al 2015. This is available as Supplementary Table S3 at the paper's Supplementary Data page. I opened this excel file and saved it as a tab-delimited text file for import into R. Take note of where you saved that file and import it into R. The read.table function is a useful tool for this purpose.

mutation_data=read.table(file="~/Downloads/152934_1_supp_3139930_n6h2q6.txt", header=TRUE, sep="\t")

We will need to rename the column headings for “patient”, “gene name”, and “trv type” to one of the sets of supported column headings (see ?waterfall for examples). Note that the trv_types (mutation types) in this case follow the style of the MGI annotator as documented here. Other formats (standard MAF or custom) are also supported.


Finally, let's run the waterfall command. Note, a few options are specified to add x-axis labels and cell labels.

pdf(file="~/Dropbox/BioStars/GenVisR_waterfall_example1.pdf", width=12, height=8)
waterfall(mutation_data, fileType="MGI", mainXlabel=TRUE, mainLabelCol="amino.acid.change", mainLabelSize=2)

You should see a waterfall plot very similar to that published in Ma et al (2015).

enter image description here

GenVisR Tutorials currently available at BioStars:

ADD COMMENTlink modified 10 weeks ago • written 2.4 years ago by Obi Griffith16k

Thanks a zillion!

I scanned quickly the paper trying to find the meaning behind: Normal.ref.count, Normal.var.count, Normal.VAF, Tumor.ref.count, Tumor.var.count, Tumor.VAF. Unfortunately I couldn't. Sorry if this sounds a very simple question but VAF can have a lot of meaning in my genetic dictionary ;) and count could also mean a lot of things to me ... so, would you kindly explain it?

Thank you again

ADD REPLYlink written 2.4 years ago by H.Hasani590

VAF in this case refers to variant allele fraction (frequency). "ref.count" is the number of reads supporting the reference allele. "var.count" is the number of reads supporting the variant allele. VAF = var.count/(ref.count+var.count).

ADD REPLYlink written 2.4 years ago by Obi Griffith16k

Dear Obi,

Can we add the percentage of mutation for each gene per sample on this plot?

Kind Regards, Rahel

ADD REPLYlink written 19 months ago by rahel1435020

I'm not sure what you are asking. Do you mean the variant allele frequency for each gene/sample? As a text value in each cell? Or, something else?

ADD REPLYlink written 19 months ago by Obi Griffith16k

Dear Obi, Many thanks for your reply. Yes, I mean that to add the VAF to each cell. I have this AF value in mutect2 out put and also %frequency from Varscan2 output, I figured out that I can add them as separate column in my input data and then add it to the watefall command, but I am not sure if it is correct. Is there any way to extract the VAF values for each cell (gene per sample) from GenVisR and tell the command to write the VAF value on each cell? Kind regards, Rahel

ADD REPLYlink written 19 months ago by rahel1435020

Yes. Just look at the format of the "mutation_data" object. If you can add another column with VAF values then you should be able to just specify mainLabelCol="X" where X is the name of your new VAF column rather instead of "amino.acid.change".

ADD REPLYlink written 19 months ago by Obi Griffith16k

Hi Griffith,

The mutation data which I'm using has


frameshift indel    inframe indel     missense SNV     nonsense SNV         stoploss 
        1352               354            10206              808                5 
  synonymous SNV 

Is filetype "Custom" ? And when I run the data I got the following error:

most_deleterious <- c("missense SNV","frameshift indel","nonsense SNV","inframe indel","stoploss","synonymous SNV")

waterfall(SMutations, fileType="Custom", variant_class_order = most_deleterious, mainXlabel=TRUE, mainLabelCol="vaf", mainLabelSize=2, rmvSilent = TRUE)

Checking if input is properly formatted...
Detected "Custom" file_type flag, looking for correct column names...
Calculating frequency of mutations...
Removing silent mutations...
setting mutation hierarchy...
Error in waterfall_hierarchyTRV(data_frame, fileType, variant_class_order) : 

Detected an invalid mutation type, valid values for Custom are: missense SNV, frameshift indel, nonsense SNV, inframe indel, stoploss, synonymous SNV
In addition: Warning message:
In waterfall_Custom2anno(x, label_col) :
 Did not detect silent mutations in input, is this expected?

P.S: I see that in "Variant_class" there are also "NA"s. Could you please help me in this.

Thanks in Advance

ADD REPLYlink modified 18 months ago • written 18 months ago by Bioinfo220

Hi dsp,

you mention the original input has NA's, are those in the "variant_class" column? If so adding NA to the most_deleterious variable should resolve the issue: most_deleterious <- c("missense SNV","frameshift indel","nonsense SNV","inframe indel","stoploss","synonymous SNV", NA)

ADD REPLYlink written 18 months ago by zlskidmore280

Thanks for the reply. It worked.

ADD REPLYlink written 18 months ago by Bioinfo220

Dear Griffith, Is there a way to plot the "Mutations per MB" in a separate plot not as a part of Waterfall using GeneVisR? Something like figure 1 in Alexandrov et al., 2016 https://d2ufo47lrtsv5s.cloudfront.net/content/sci/354/6312/618/F1.large.jpg Please let me know if I had to make a new question for this ... Many thanks in advance, Rahel

ADD REPLYlink modified 18 months ago • written 18 months ago by rahel1435020

Hi Rahel,

The link which you posted is not working. If you want to give an image Go to (https://imgsafe.org/) and browse the image u need, copy the link.

Above the comment box, click the image and Paste the link you copied before. Better create a new question.

Thank you

ADD REPLYlink written 18 months ago by Bioinfo220

Many thanks for your advice. I did create a new question for that: Plotting numbers of somatic substitutions per megabase (MB)

ADD REPLYlink written 18 months ago by rahel1435020

Dear Obi, I have some cases that do not have mutation for my candidate genes (NaN or NA), but I still want these cases to be present in the plot,do you know how should I format the input data?

ADD REPLYlink written 15 months ago by xiekunlin19860

Depending on the version of GenVisR you are using there are two options, using GenVisR's internal dataset as an example:

option 1. use the plot samples parameter

samples <- c(as.character(unique(brcaMAF$Tumor_Sample_Barcode)), "test1", "test2")
waterfall(brcaMAF, mainRecurCutoff=.1, plotSamples=samples)

option 2. make a dataframe with those samples, NA out everything else

tmp <- data.frame("Tumor_Sample_Barcode"=c("test1", "test2"))
tmp2 <- plyr::rbind.fill(tmp, brcaMAF)
waterfall(tmp2, mainRecurCutoff=.1)
ADD REPLYlink modified 15 months ago • written 15 months ago by zlskidmore280

Thanks for kind explanation! I would like to ask a one more question. Is there any other option to enlarge the size of sample name, I mean, the size of x-axis label?

Best, Caroline

ADD REPLYlink written 15 months ago by carolinie03060

You could do that indirectly, because all of the plots in GenVisR are based on ggplot2 you can just add a layer, something like this should work

 # load libraries

# create ggplot2 layer to alter axis text size (in this case we're overriding an existing layer, so its good to set the angle as well)
layer <- list(theme(axis.text.x=element_text(size=20, angle=90)))

# pass layer to GenVisR
waterfall(brcaMAF, mainRecurCutoff=.1, mainLayer=layer)
ADD REPLYlink modified 15 months ago • written 15 months ago by zlskidmore280

Oh, I should try it! Thanks a lot :)

ADD REPLYlink written 15 months ago by carolinie03060
gravatar for Ming Tang
2.4 years ago by
Ming Tang2.3k
Houston/MD Anderson Cancer Center
Ming Tang2.3k wrote:

In the complexHeatmap package, there is a function called oncoprint to do similar stuff. https://bioconductor.org/packages/release/bioc/vignettes/ComplexHeatmap/inst/doc/s8.oncoprint.html

ADD COMMENTlink written 2.4 years ago by Ming Tang2.3k

Yes, I have used OncoPrint and it is very powerful. ComplexHeamap, generally, is one of the best R packages that has ever been created.

ADD REPLYlink written 9 months ago by Kevin Blighe24k
gravatar for poisonAlien
2.4 years ago by
poisonAlien2.5k wrote:

check out maftools. Oncplot wraps around oncorprint script from CmplexHeatmap but directly works on maf files (mutation data).

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by poisonAlien2.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 579 users visited in the last hour