Tutorial: How to create a mutation landscape (waterfall) plot with GenVisR
gravatar for Obi Griffith
16 months ago by
Obi Griffith15k
Washington University, St Louis, USA
Obi Griffith15k wrote:

This tutorial makes use of the GenVisR package. Please cite:

Skidmore ZL, Wagner AH, Lesurf R, Campbell KM, Kunisaki J, Griffith OL, Griffith M. 2016. GenVisR: Genomic Visualizations in R. Bioinformatics. pii: btw325. [Epub ahead of print]
PubMed | Bioinformatics Journal | BioRxiv | Bioconductor | GitHub

Upon sequencing a set of tumor samples, from small to large cohorts, a common first step is to analyze/visualize the recurrence and co-occurrence of gene-level mutations. If you look in almost any recent TCGA or ICGC paper you will find plots similar to the example below where a matrix of genes by samples is shown with mutation type indicated by coloring of each matrix cell. It is also often common to add sidebars for relevant clinical details, gene mutation rate, and/or sample mutation burden. There are some web-based tools that now allow you to create such views for pre-loaded data. For example, the cBioPortal provides such visualizations for much of their pre-loaded TCGA and ICGC data and their excellent OncoPrinter tool also allows custom data input. However, in many cases, producing publication-ready waterfall plots requires further customization. Such custom plots have historically been created through ad hoc R plotting or even manually in Excel (yikes!). In other cases, automated generation of plots for multiple sets of genes (e.g., pathway by pathway views) is desired. To address the needs for automation, customization and accessibility we have created the GenVisR package for Genomic Visualizations in R. The waterfall function is just one of many convenient functions for the production of highly customizable publication quality graphics for genomic data primarily at the cohort level.

In this tutorial we will demonstrate the use of the GenVisR waterfall function. We will recreate such a plot as that shown below and recently published in Ma et al (2015).

waterfall plot from Ma et al (2015)

The first required step is to install GenVisR. First, make sure that you have the latest version of R (3.3.0 or later) available from CRAN and launch an R session. GenVisR is available through BioConductor and can be installed by the usual method. At an R prompt, we will install GenVisR and load the GenVisR library as follows:


Now, lets get the mutation data for Ma et al 2015. This is available as Supplementary Table S3 at the paper's Supplementary Data page. I opened this excel file and saved it as a tab-delimited text file for import into R. Take note of where you saved that file and import it into R. The read.table function is a useful tool for this purpose.

mutation_data=read.table(file="~/Downloads/152934_1_supp_3139930_n6h2q6.txt", header=TRUE, sep="\t")

We will need to rename the column headings for “patient”, “gene name”, and “trv type” to one of the sets of supported column headings (see ?waterfall for examples). Note that the trv_types (mutation types) in this case follow the style of the MGI annotator as documented here. Other formats (standard MAF or custom) are also supported.


Finally, let's run the waterfall command. Note, a few options are specified to add x-axis labels and cell labels.

pdf(file="~/Dropbox/BioStars/GenVisR_waterfall_example1.pdf", width=12, height=8)
waterfall(mutation_data, fileType="MGI", mainXlabel=TRUE, mainLabelCol="amino.acid.change", mainLabelSize=2)

You should see a waterfall plot very similar to that published in Ma et al (2015).

enter image description here

GenVisR Tutorials currently available at BioStars:

ADD COMMENTlink modified 7 months ago by rahel1435010 • written 16 months ago by Obi Griffith15k

Thanks a zillion!

I scanned quickly the paper trying to find the meaning behind: Normal.ref.count, Normal.var.count, Normal.VAF, Tumor.ref.count, Tumor.var.count, Tumor.VAF. Unfortunately I couldn't. Sorry if this sounds a very simple question but VAF can have a lot of meaning in my genetic dictionary ;) and count could also mean a lot of things to me ... so, would you kindly explain it?

Thank you again

ADD REPLYlink written 16 months ago by H.Hasani570

VAF in this case refers to variant allele fraction (frequency). "ref.count" is the number of reads supporting the reference allele. "var.count" is the number of reads supporting the variant allele. VAF = var.count/(ref.count+var.count).

ADD REPLYlink written 16 months ago by Obi Griffith15k

Dear Obi,

Can we add the percentage of mutation for each gene per sample on this plot?

Kind Regards, Rahel

ADD REPLYlink written 7 months ago by rahel1435010

I'm not sure what you are asking. Do you mean the variant allele frequency for each gene/sample? As a text value in each cell? Or, something else?

ADD REPLYlink written 7 months ago by Obi Griffith15k

Dear Obi, Many thanks for your reply. Yes, I mean that to add the VAF to each cell. I have this AF value in mutect2 out put and also %frequency from Varscan2 output, I figured out that I can add them as separate column in my input data and then add it to the watefall command, but I am not sure if it is correct. Is there any way to extract the VAF values for each cell (gene per sample) from GenVisR and tell the command to write the VAF value on each cell? Kind regards, Rahel

ADD REPLYlink written 7 months ago by rahel1435010

Yes. Just look at the format of the "mutation_data" object. If you can add another column with VAF values then you should be able to just specify mainLabelCol="X" where X is the name of your new VAF column rather instead of "amino.acid.change".

ADD REPLYlink written 7 months ago by Obi Griffith15k

Hi Griffith,

The mutation data which I'm using has


frameshift indel    inframe indel     missense SNV     nonsense SNV         stoploss 
        1352               354            10206              808                5 
  synonymous SNV 

Is filetype "Custom" ? And when I run the data I got the following error:

most_deleterious <- c("missense SNV","frameshift indel","nonsense SNV","inframe indel","stoploss","synonymous SNV")

waterfall(SMutations, fileType="Custom", variant_class_order = most_deleterious, mainXlabel=TRUE, mainLabelCol="vaf", mainLabelSize=2, rmvSilent = TRUE)

Checking if input is properly formatted...
Detected "Custom" file_type flag, looking for correct column names...
Calculating frequency of mutations...
Removing silent mutations...
setting mutation hierarchy...
Error in waterfall_hierarchyTRV(data_frame, fileType, variant_class_order) : 

Detected an invalid mutation type, valid values for Custom are: missense SNV, frameshift indel, nonsense SNV, inframe indel, stoploss, synonymous SNV
In addition: Warning message:
In waterfall_Custom2anno(x, label_col) :
 Did not detect silent mutations in input, is this expected?

P.S: I see that in "Variant_class" there are also "NA"s. Could you please help me in this.

Thanks in Advance

ADD REPLYlink modified 6 months ago • written 6 months ago by vignesh90

Hi dsp,

you mention the original input has NA's, are those in the "variant_class" column? If so adding NA to the most_deleterious variable should resolve the issue: most_deleterious <- c("missense SNV","frameshift indel","nonsense SNV","inframe indel","stoploss","synonymous SNV", NA)

ADD REPLYlink written 6 months ago by zlskidmore280

Thanks for the reply. It worked.

ADD REPLYlink written 6 months ago by vignesh90

Dear Griffith, Is there a way to plot the "Mutations per MB" in a separate plot not as a part of Waterfall using GeneVisR? Something like figure 1 in Alexandrov et al., 2016 https://d2ufo47lrtsv5s.cloudfront.net/content/sci/354/6312/618/F1.large.jpg Please let me know if I had to make a new question for this ... Many thanks in advance, Rahel

ADD REPLYlink modified 6 months ago • written 6 months ago by rahel1435010

Hi Rahel,

The link which you posted is not working. If you want to give an image Go to (https://imgsafe.org/) and browse the image u need, copy the link.

Above the comment box, click the image and Paste the link you copied before. Better create a new question.

Thank you

ADD REPLYlink written 6 months ago by vignesh90

Many thanks for your advice. I did create a new question for that: Plotting numbers of somatic substitutions per megabase (MB)

ADD REPLYlink written 6 months ago by rahel1435010

Dear Obi, I have some cases that do not have mutation for my candidate genes (NaN or NA), but I still want these cases to be present in the plot,do you know how should I format the input data?

ADD REPLYlink written 3 months ago by xiekunlin19860

Depending on the version of GenVisR you are using there are two options, using GenVisR's internal dataset as an example:

option 1. use the plot samples parameter

samples <- c(as.character(unique(brcaMAF$Tumor_Sample_Barcode)), "test1", "test2")
waterfall(brcaMAF, mainRecurCutoff=.1, plotSamples=samples)

option 2. make a dataframe with those samples, NA out everything else

tmp <- data.frame("Tumor_Sample_Barcode"=c("test1", "test2"))
tmp2 <- plyr::rbind.fill(tmp, brcaMAF)
waterfall(tmp2, mainRecurCutoff=.1)
ADD REPLYlink modified 3 months ago • written 3 months ago by zlskidmore280

Thanks for kind explanation! I would like to ask a one more question. Is there any other option to enlarge the size of sample name, I mean, the size of x-axis label?

Best, Caroline

ADD REPLYlink written 12 weeks ago by carolinie03060

You could do that indirectly, because all of the plots in GenVisR are based on ggplot2 you can just add a layer, something like this should work

 # load libraries

# create ggplot2 layer to alter axis text size (in this case we're overriding an existing layer, so its good to set the angle as well)
layer <- list(theme(axis.text.x=element_text(size=20, angle=90)))

# pass layer to GenVisR
waterfall(brcaMAF, mainRecurCutoff=.1, mainLayer=layer)
ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by zlskidmore280

Oh, I should try it! Thanks a lot :)

ADD REPLYlink written 12 weeks ago by carolinie03060
gravatar for tangming2005
16 months ago by
Houston/MD Anderson Cancer Center
tangming20052.1k wrote:

In the complexHeatmap package, there is a function called oncoprint to do similar stuff. https://bioconductor.org/packages/release/bioc/vignettes/ComplexHeatmap/inst/doc/s8.oncoprint.html

ADD COMMENTlink written 16 months ago by tangming20052.1k
gravatar for poisonAlien
16 months ago by
poisonAlien2.3k wrote:

check out maftools. Oncplot wraps around oncorprint script from CmplexHeatmap but directly works on maf files (mutation data).

ADD COMMENTlink modified 16 months ago • written 16 months ago by poisonAlien2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 476 users visited in the last hour