This tutorial makes use of the GenVisR package. Please cite:
Skidmore ZL, Wagner AH, Lesurf R, Campbell KM, Kunisaki J, Griffith OL, Griffith M. 2016. GenVisR: Genomic Visualizations in R. Bioinformatics. pii: btw325. [Epub ahead of print]
PubMed | Bioinformatics Journal | BioRxiv | Bioconductor | GitHub
A commonly desired genomic visualization is the so called mutation lolliplot. After identifying genes with recurrent mutations, the next step is often to visualize how those mutations are distributed across the coding space of the gene. This allows the viewer to identify locations of mutation clustering (hotspots), protein domains affected by mutations, or other patterns related to the position and type of mutation. Typically, a simple model of the protein-coding portion of a gene is shown with mutations marked by stacks of connected dots above and/or below the gene, possibly colored by mutation type, and with appropriate mutation, gene, and protein domain labels. There are several web-based tools for visualizing data in the manner. For example, St Jude's excellent ProteinPaint application available through their PeCan Data Portal provides such visualizations for pre-loaded pediatric cancers and COSMIC data. The ICGC data portal and cBioPortal provide such visualizations for ICGC, TCGA and other data as does COSMIC for its own massive pre-loaded datasets. The MutationMapper tool from cBioPortal allows custom mutation lists to be uploaded. A command-line tool developed by David Larson was the inspiration for GenVisR::lolliplot and is also available through the Genome Modeling System to create similar plots.
However, in many cases, producing publication-ready lolliplots requires further customization. A user may wish to visualize a custom dataset not included in the above web portals or may wish to choose different protein isoform or source of protein domain annotations. In other cases, automated generation of plots for multiple sets of genes (e.g., all recurrently mutated genes) is desired. Such custom plots have historically been created through ad hoc R plotting. To address the needs for automation, customization and accessibility we have created the GenVisR package for Genomic Visualizations in R. The lolliplot function is just one of many convenient functions for the production of highly customizable publication quality graphics for genomic data primarily at the cohort level.
In this tutorial we will demonstrate the use of the GenVisR lolliplot function. We will create such a plot using data recently published in Ma et al (2015).
The first required step is to install GenVisR. First, make sure that you have the latest version of R (3.3.0 or later) available from CRAN and launch an R session. GenVisR is available through BioConductor and can be installed by the usual method. At an R prompt, we will install GenVisR and load the GenVisR library as follows:
source("https://bioconductor.org/biocLite.R") biocLite("GenVisR") library(GenVisR)
Now, lets get the mutation data for Ma et al 2015. This is available as Supplementary Table S3 at the paper's Supplementary Data page. I opened this excel file and saved it as a tab-delimited text file for import into R. Take note of where you saved that file and import it into R. The read.table function is a useful tool for this purpose.
mutation_data=read.table(file="~/Downloads/152934_1_supp_3139930_n6h2q6.txt", header=TRUE, sep="\t")
We will need to rename the column headings for “gene_name”, “amino acid change”, and “transcript name” to the expected column headings (see ?lolliplot for details). Let's also rename trv_type (mutation type) as that will be used to color dots.
Extract just data for just the PIK3CA gene for plotting.
Plot the lolliplot. We can customize the plot by coloring dots by mutation type, labelling by amino acid change, and tweaking the text size and angle. Note that the variant annotations from Ma et al (2015) were reported for Ensembl version 74 (see Patients and Methods). By default GenVisR uses the latest version of Ensembl. To ensure consistency between reported mutations and transcript annotation/structure you can specify the appropriate Ensembl archive for version 74 with the 'host' parameter.
pdf(file="~/Dropbox/BioStars/GenVisR_lolliplot_example1.pdf", width=12, height=4) lolliplot(PIK3CA_data, fillCol="mutation_type", labelCol="amino_acid_change", txtSize=3, txtAngle=20, host="dec2013.archive.ensembl.org") dev.off()
You should now have a basic lolliplot for PIK3CA mutations from the Ma et al (2015) data.
GenVisR Tutorials currently available at BioStars: