Tutorial:How to create a mutation landscape (waterfall) plot with GenVisR
2
63
Entering edit mode
7.1 years ago

This tutorial makes use of the GenVisR package. Please cite:

Skidmore ZL, Wagner AH, Lesurf R, Campbell KM, Kunisaki J, Griffith OL, Griffith M. 2016. GenVisR: Genomic Visualizations in R. Bioinformatics. pii: btw325. [Epub ahead of print]

Note: A more comprehensive version of this tutorial is also available as part of our Genomic Visualization and Interpretation course at www.genviz.org.

Upon sequencing a set of tumor samples, from small to large cohorts, a common first step is to analyze/visualize the recurrence and co-occurrence of gene-level mutations. If you look in almost any recent TCGA or ICGC paper you will find plots similar to the example below where a matrix of genes by samples is shown with mutation type indicated by coloring of each matrix cell. It is also often common to add sidebars for relevant clinical details, gene mutation rate, and/or sample mutation burden. There are some web-based tools that now allow you to create such views for pre-loaded data. For example, the cBioPortal provides such visualizations for much of their pre-loaded TCGA and ICGC data and their excellent OncoPrinter tool also allows custom data input. However, in many cases, producing publication-ready waterfall plots requires further customization. Such custom plots have historically been created through ad hoc R plotting or even manually in Excel (yikes!). In other cases, automated generation of plots for multiple sets of genes (e.g., pathway by pathway views) is desired. To address the needs for automation, customization and accessibility we have created the GenVisR package for Genomic Visualizations in R. The waterfall function is just one of many convenient functions for the production of highly customizable publication quality graphics for genomic data primarily at the cohort level.

In this tutorial we will demonstrate the use of the GenVisR waterfall function. We will recreate such a plot as that shown below and recently published in Ma et al (2015).

The first required step is to install GenVisR. First, make sure that you have the latest version of R (3.3.0 or later) available from CRAN and launch an R session. GenVisR is available through BioConductor and can be installed by the usual method. At an R prompt, we will install GenVisR and load the GenVisR library as follows:

source("https://bioconductor.org/biocLite.R")
biocLite("GenVisR")
library(GenVisR)


Now, lets get the mutation data for Ma et al 2015. This is available as Supplementary Table S3 at the paper's Supplementary Data page. I opened this excel file and saved it as a tab-delimited text file for import into R. Take note of where you saved that file and import it into R. The read.table function is a useful tool for this purpose.

mutation_data=read.table(file="~/Downloads/152934_1_supp_3139930_n6h2q6.txt", header=TRUE, sep="\t")


We will need to rename the column headings for “patient”, “gene name”, and “trv type” to one of the sets of supported column headings (see ?waterfall for examples). Note that the trv_types (mutation types) in this case follow the style of the MGI annotator as documented here. Other formats (standard MAF or custom) are also supported.

colnames(mutation_data)[c(1,8,12)]=c("sample","gene_name","trv_type")


Finally, let's run the waterfall command. Note, a few options are specified to add x-axis labels and cell labels.

pdf(file="~/Dropbox/BioStars/GenVisR_waterfall_example1.pdf", width=12, height=8)
waterfall(mutation_data, fileType="MGI", mainXlabel=TRUE, mainLabelCol="amino.acid.change", mainLabelSize=2)
dev.off()


You should see a waterfall plot very similar to that published in Ma et al (2015).

GenVisR Tutorials currently available at BioStars:

BioConductor GenVisR Graphics R • 28k views
0
Entering edit mode

Thanks a zillion!

I scanned quickly the paper trying to find the meaning behind: Normal.ref.count, Normal.var.count, Normal.VAF, Tumor.ref.count, Tumor.var.count, Tumor.VAF. Unfortunately I couldn't. Sorry if this sounds a very simple question but VAF can have a lot of meaning in my genetic dictionary ;) and count could also mean a lot of things to me ... so, would you kindly explain it?

Thank you again

0
Entering edit mode

VAF in this case refers to variant allele fraction (frequency). "ref.count" is the number of reads supporting the reference allele. "var.count" is the number of reads supporting the variant allele. VAF = var.count/(ref.count+var.count).

0
Entering edit mode

Dear Obi,

Can we add the percentage of mutation for each gene per sample on this plot?

Kind Regards, Rahel

1
Entering edit mode

I'm not sure what you are asking. Do you mean the variant allele frequency for each gene/sample? As a text value in each cell? Or, something else?

0
Entering edit mode

Dear Obi, Many thanks for your reply. Yes, I mean that to add the VAF to each cell. I have this AF value in mutect2 out put and also %frequency from Varscan2 output, I figured out that I can add them as separate column in my input data and then add it to the watefall command, but I am not sure if it is correct. Is there any way to extract the VAF values for each cell (gene per sample) from GenVisR and tell the command to write the VAF value on each cell? Kind regards, Rahel

0
Entering edit mode

Yes. Just look at the format of the "mutation_data" object. If you can add another column with VAF values then you should be able to just specify mainLabelCol="X" where X is the name of your new VAF column rather instead of "amino.acid.change".

0
Entering edit mode

Hi Griffith,

The mutation data which I'm using has

 table(SMutations$variant_class) frameshift indel inframe indel missense SNV nonsense SNV stoploss 1352 354 10206 808 5 synonymous SNV 3  Is filetype "Custom" ? And when I run the data I got the following error: most_deleterious <- c("missense SNV","frameshift indel","nonsense SNV","inframe indel","stoploss","synonymous SNV") waterfall(SMutations, fileType="Custom", variant_class_order = most_deleterious, mainXlabel=TRUE, mainLabelCol="vaf", mainLabelSize=2, rmvSilent = TRUE) Checking if input is properly formatted... Detected "Custom" file_type flag, looking for correct column names... Calculating frequency of mutations... Removing silent mutations... setting mutation hierarchy... Error in waterfall_hierarchyTRV(data_frame, fileType, variant_class_order) : Detected an invalid mutation type, valid values for Custom are: missense SNV, frameshift indel, nonsense SNV, inframe indel, stoploss, synonymous SNV In addition: Warning message: In waterfall_Custom2anno(x, label_col) : Did not detect silent mutations in input, is this expected?  P.S: I see that in "Variant_class" there are also "NA"s. Could you please help me in this. Thanks in Advance ADD REPLY 1 Entering edit mode Hi dsp, you mention the original input has NA's, are those in the "variant_class" column? If so adding NA to the most_deleterious variable should resolve the issue: most_deleterious <- c("missense SNV","frameshift indel","nonsense SNV","inframe indel","stoploss","synonymous SNV", NA) ADD REPLY 0 Entering edit mode Thanks for the reply. It worked. ADD REPLY 0 Entering edit mode Dear Griffith, Is there a way to plot the "Mutations per MB" in a separate plot not as a part of Waterfall using GeneVisR? Something like figure 1 in Alexandrov et al., 2016 https://d2ufo47lrtsv5s.cloudfront.net/content/sci/354/6312/618/F1.large.jpg Please let me know if I had to make a new question for this ... Many thanks in advance, Rahel ADD REPLY 1 Entering edit mode Hi Rahel, The link which you posted is not working. If you want to give an image Go to (https://imgsafe.org/) and browse the image u need, copy the link. Above the comment box, click the image and Paste the link you copied before. Better create a new question. Thank you ADD REPLY 0 Entering edit mode Many thanks for your advice. I did create a new question for that: Plotting numbers of somatic substitutions per megabase (MB) ADD REPLY 0 Entering edit mode Dear Obi, I have some cases that do not have mutation for my candidate genes (NaN or NA), but I still want these cases to be present in the plot,do you know how should I format the input data? ADD REPLY 0 Entering edit mode Depending on the version of GenVisR you are using there are two options, using GenVisR's internal dataset as an example: option 1. use the plot samples parameter samples <- c(as.character(unique(brcaMAF$Tumor_Sample_Barcode)), "test1", "test2")

waterfall(brcaMAF, mainRecurCutoff=.1, plotSamples=samples)


option 2. make a dataframe with those samples, NA out everything else

tmp <- data.frame("Tumor_Sample_Barcode"=c("test1", "test2"))

tmp2 <- plyr::rbind.fill(tmp, brcaMAF)

waterfall(tmp2, mainRecurCutoff=.1)

0
Entering edit mode

Thanks for kind explanation! I would like to ask a one more question. Is there any other option to enlarge the size of sample name, I mean, the size of x-axis label?

Best, Caroline

0
Entering edit mode

You could do that indirectly, because all of the plots in GenVisR are based on ggplot2 you can just add a layer, something like this should work

 # load libraries
library(GenVisR)
library(ggplot2)

# create ggplot2 layer to alter axis text size (in this case we're overriding an existing layer, so its good to set the angle as well)
layer <- list(theme(axis.text.x=element_text(size=20, angle=90)))

# pass layer to GenVisR
waterfall(brcaMAF, mainRecurCutoff=.1, mainLayer=layer)

0
Entering edit mode

Oh, I should try it! Thanks a lot :)

0
Entering edit mode

Dear @zlskidmore Is it possible to remove the bar plot of %Mutant in the y lab and just keep the gene name? As I am considering only 5 genes it is not necessary and take space. How can I make any changes in the %Mutant plot? I really appreciate your time and help!

0
Entering edit mode

How can I change the gridlines? I tried with mainLayer = theme(.....), but major grid is showing in the middle of the divisions, and minor grid is not showing at all. I just want to change the grid lines of the main graph for black lines with specific sizes!

0
Entering edit mode

How can I change the gridlines? I tried with mainLayer = theme(.....), but major grid is showing in the middle of the divisions, and minor grid is not showing at all. I just want to change the grid lines of the main graph for black lines with specific sizes!

0
Entering edit mode

Many thanks @Obi Griffith for this nice tutorial. Is it possible to remove the bar plot of %Mutant in the y lab and just keep the gene name? As I am considering only 5 genes it is not necessary and take space.

0
Entering edit mode

Hi @Obi, How can we change the size of text in the x-axis of %Mutant plot next to the main plot? I tried to do it by theme but didn't work. Many thanks!

0
Entering edit mode

Fantastic tool, thank you very much Griffiths lab (Obi Griffith, zlskidmore, Malachi Griffith ). I have a question:

Is it possible to visualise the different mutations falling in the same gene in a given sample on a waterfall plot? I can give an example, using the waterfall plot given in this tutorial. If we assume that sample 19 has 2 different missense mutations in PIK3CA (p.E542K and something else), is it possible to visualise the two in the same waterfall plot (for example, a second row for PIK3CA)?

Best wishes.

0
Entering edit mode

Hi, thanks for this wonderful tool! I have a question Obi Griffith I am trying to install the library GenVisR like the tutorial above, but even with the old versions of R, it is not installing, I tried several tutorials and in countless ways, but I am not able to solve the installation of the library. He always return similar message:

Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called 'ggplot2'
Além disso: Warning message:
package 'GenVisR' was built under R version 3.3.2
Erro: package or namespace load failed for 'GenVisR'


I hope someone can help me.

7
Entering edit mode
7.1 years ago
Ming Tommy Tang ★ 2.9k

In the complexHeatmap package, there is a function called oncoprint to do similar stuff. https://bioconductor.org/packages/release/bioc/vignettes/ComplexHeatmap/inst/doc/s8.oncoprint.html

3
Entering edit mode

Yes, I have used OncoPrint and it is very powerful. ComplexHeamap, generally, is one of the best R packages that has ever been created.

6
Entering edit mode
7.1 years ago
poisonAlien ★ 3.1k

check out maftools. Oncplot wraps around oncorprint script from CmplexHeatmap but directly works on maf files (mutation data).