trying to make nice-looking plot of gene structure with mutations
0
1
Entering edit mode
2.5 years ago

I'd like to do something in R that seems rather simple: draw a nice plot of a human gene, with exons and introns marked (maybe even using a different color for CDS vs. UTRs, but I can live without that for now), with the position of a bunch of mutations overlaid. I've tried a few packages for this, the best of which (so far, has been ggbio. Here's some code for reproducibility:

library(Homo.sapiens)
library(ggbio)
library(tidyverse)

# first get gene structure information from Homo.sapiens annotation package
genes_all <- AnnotationDbi::keys(Homo.sapiens, keytype='GENEID')
generanges <- AnnotationDbi::select(Homo.sapiens, 
                      keys=genes_all, keytype='GENEID',
                      columns=c('GENEID', 'SYMBOL', 'TXCHROM', 'TXSTART', 'TXEND'))
generanges <- drop_na(generanges)  # need to get rid of these in order to indexm later
# convert to GRanges object
generanges <- makeGRangesFromDataFrame(generanges, keep.extra.columns=T)
# we'll use RSPO1 gene as an example
generanges[generanges$SYMBOL=='RSPO1']
# simulate 200 SNPs randomly scattered through RSPO1
snps <- sample(start(generanges[generanges$SYMBOL=='RSPO1']):end(generanges[generanges$SYMBOL=='RSPO1']),
               200, replace=F)

# plot gene structures with autoplot, add SNP positions with geom_jitter
autoplot(Homo.sapiens, which=generanges[generanges$SYMBOL=='RSPO1'], 
         gap.geom='chevron', col='blue', fill='blue') + 
  geom_jitter(data=tibble(pos=snps), aes(x=pos, y=1), col='red', alpha=0.25) +
  theme_bw()

The resulting plot looks like this: RSPO1 gene plot with 4 transcripts, overlaid with SNPs in red

This is not a bad start, but I would really like to show only a single transcript, not all four splice forms - e.g. the "canonical" Refseq transcript. This is just for ease of visualization/presentation. I think what I need to do is to subset the annotation object Homo.sapiens, so that it contains only Refseq transcripts, and then pass that object to the autoplot command. I'm not sure how to do this, though - or if there is a better way to achieve this goal with a different package. (Note that I tried dandelion and lollipop plots using the trackViewer package, but the SNP density is way too high to visualize that way. Also, I really like the fact that ggbio can be combined with traditional ggplot commands, as I've done here.)

ggbio genome mutation visualization r • 1.8k views
ADD COMMENT
1
Entering edit mode

You may want to consider getting MANE select transcripts from NCBI. Read about the project here.

ADD REPLY
0
Entering edit mode

Thanks for this suggestion. Unfortunately, though, not all the genes I'm looking at are covered by MANE. In the end, I was able to kludge together a solution using ggbio for plotting and a combination of biovizBase, Homo.sapiens and TxDb.Hsapiens.UCSC.hg19.knownGene to pull out and draw only the Refseq transcript for each gene, which I could then decorate with ggplot.

ADD REPLY

Login before adding your answer.

Traffic: 2015 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6