This tutorial makes use of the GenVisR package. Please cite:
Skidmore ZL, Wagner AH, Lesurf R, Campbell KM, Kunisaki J, Griffith OL, Griffith M. 2016. GenVisR: Genomic Visualizations in R. Bioinformatics. pii: btw325. [Epub ahead of print]
Upon sequencing a set of tumor samples, from small to large cohorts, a common first step is to analyze/visualize the recurrence and co-occurrence of gene-level mutations. If you look in almost any recent TCGA or ICGC paper you will find plots similar to the example below where a matrix of genes by samples is shown with mutation type indicated by coloring of each matrix cell. It is also often common to add sidebars for relevant clinical details, gene mutation rate, and/or sample mutation burden. There are some web-based tools that now allow you to create such views for pre-loaded data. For example, the cBioPortal provides such visualizations for much of their pre-loaded TCGA and ICGC data and their excellent OncoPrinter tool also allows custom data input. However, in many cases, producing publication-ready waterfall plots requires further customization. Such custom plots have historically been created through ad hoc R plotting or even manually in Excel (yikes!). In other cases, automated generation of plots for multiple sets of genes (e.g., pathway by pathway views) is desired. To address the needs for automation, customization and accessibility we have created the GenVisR package for Genomic Visualizations in R. The waterfall function is just one of many convenient functions for the production of highly customizable publication quality graphics for genomic data primarily at the cohort level.
In this tutorial we will demonstrate the use of the GenVisR waterfall function. We will recreate such a plot as that shown below and recently published in Ma et al (2015).
The first required step is to install GenVisR. First, make sure that you have the latest version of R (3.3.0 or later) available from CRAN and launch an R session. GenVisR is available through BioConductor and can be installed by the usual method. At an R prompt, we will install GenVisR and load the GenVisR library as follows:
source("https://bioconductor.org/biocLite.R") biocLite("GenVisR") library(GenVisR)
Now, lets get the mutation data for Ma et al 2015. This is available as Supplementary Table S3 at the paper's Supplementary Data page. I opened this excel file and saved it as a tab-delimited text file for import into R. Take note of where you saved that file and import it into R. The read.table function is a useful tool for this purpose.
mutation_data=read.table(file="~/Downloads/152934_1_supp_3139930_n6h2q6.txt", header=TRUE, sep="\t")
We will need to rename the column headings for “patient”, “gene name”, and “trv type” to one of the sets of supported column headings (see
?waterfall for examples). Note that the trv_types (mutation types) in this case follow the style of the MGI annotator as documented here. Other formats (standard MAF or custom) are also supported.
Finally, let's run the waterfall command. Note, a few options are specified to add x-axis labels and cell labels.
pdf(file="~/Dropbox/BioStars/GenVisR_waterfall_example1.pdf", width=12, height=8) waterfall(mutation_data, fileType="MGI", mainXlabel=TRUE, mainLabelCol="amino.acid.change", mainLabelSize=2) dev.off()
You should see a waterfall plot very similar to that published in Ma et al (2015).
GenVisR Tutorials currently available at BioStars: