GenomicFeatures: Select exons by transcript. Any faster option than exonsBy?
Entering edit mode
5.7 years ago
stianlagstad ★ 1.1k

Consider this code:


txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

  # Option 1:
    vals = list(tx_name = c("uc001aaa.3", "uc010nxq.1")),
    columns = list("EXONNAME", "TXNAME", "GENEID")),
  # Option 2:
  GenomicFeatures::exonsBy(txdb, by = "tx", use.names = TRUE)[c("uc001aaa.3", "uc010nxq.1")],
  times = 10

# Option 1 takes an average of 1.6 seconds
# Option 2 takes an average of 6.5 seconds

These two options gets me the same data, but option1 is a lot faster. Is there an easy way I can use option1 and still get the structure that is option2? Or is there a way to make exonsBy faster by only extracting the transcripts I'm interested in?

My end goal is to get a GRangesList object with one GRanges object per transcript, OR a single GRanges object with duplicate entries for exons that appear in multiple transcripts (if that's possible). To begin with I only have the transcript names and the TxDb object.


I started working on an example to help explain what I want to accomplish. Below I'm trying two approaches, using exonsBy() (which succeeds but is slow) and using exons() (which fails but retrieves data faster):


txdbhg19 <- GenomicFeatures::makeTxDbFromBiomart(
  dataset = "hsapiens_gene_ensembl",
  host = "")

transcriptNames <- c("ENST00000370032", "ENST00000420055")

# ----- APPROACH 1

# Get GrangesList
tst <- exonsBy(txdbhg19, by = "tx", use.names=TRUE)[transcriptNames]
# Unlist to get GRanges
tst <- unlist(tst)
# Add transcript metadata column to make Gviz group correctly
elementMetadata(tst)$transcript <- names(tst)
# Create track and plot
trTrack <- Gviz::GeneRegionTrack(tst)
# This is the plot I want to produce

# ----- APPROACH 2

tst2 <- GenomicFeatures::exons(
  vals = list(tx_name = transcriptNames),
  columns = list("EXONNAME", "TXNAME", "GENEID"))

# Problem 1: The TXNAME metadata field now contains transcripts that are not in my original transcript list:
unique(unlist(elementMetadata(tst2)$TXNAME[!elementMetadata(tst2)$TXNAME %in% transcriptNames]))
# [1] "ENST00000370031" "ENST00000402983"
# I can hack this out, but it feels unclean to do this:
elementMetadata(tst2)$TXNAME <- elementMetadata(tst2)$TXNAME[elementMetadata(tst2)$TXNAME %in% transcriptNames]
# I would much rather have it so that I only get the transcript names that I originally wanted. Anyhow, let's continue..

# Each entry in elementMetadata(tst2)$TXNAME now has 1 or more transcript names in it. This is not what I want. I want to somehow "expand" this GRanges object, so that each row only have one TXNAME value. Here's a very naive way to do it:

gr <- GRanges()
for (i in 1:length(tst2)) {
  trNames <- elementMetadata(tst2[i])$TXNAME[[1]]
  trNames <- strsplit(trNames, " ")
  for (j in 1:length(trNames)) {
    newGr <- tst2[i]
    elementMetadata(newGr)$TXNAME <- trNames[[j]]
    gr <- append(gr, newGr)
    print(paste(elementMetadata(tst2[i])$GENEID, trNames[[j]]))
elementMetadata(gr)$transcript <- elementMetadata(gr)$TXNAME

trTrack <- Gviz::GeneRegionTrack(gr)
# Error in (function (classes, fdef, mtable)  :
# unable to find an inherited method for function ‘displayPars<-’ for signature ‘"GRanges", "list"’

# Why do I get this error? I was hoping that this gr structure would be similar to the tst structure in approach 1

The goal is to get a structure similar to the one I get with exonsBy() -> unlist(). Any ideas?

GenomicFeatures • 2.5k views

Login before adding your answer.

Traffic: 1689 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6