GenomicFeatures: Select exons by transcript. Any faster option than exonsBy?
0
0
Entering edit mode
8.1 years ago
stianlagstad ★ 1.1k

Consider this code:

library(microbenchmark)
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

microbenchmark::microbenchmark(
  # Option 1:
  GenomicFeatures::exons(
    txdb,
    vals = list(tx_name = c("uc001aaa.3", "uc010nxq.1")),
    columns = list("EXONNAME", "TXNAME", "GENEID")),
  # Option 2:
  GenomicFeatures::exonsBy(txdb, by = "tx", use.names = TRUE)[c("uc001aaa.3", "uc010nxq.1")],
  times = 10
)

# Option 1 takes an average of 1.6 seconds
# Option 2 takes an average of 6.5 seconds

These two options gets me the same data, but option1 is a lot faster. Is there an easy way I can use option1 and still get the structure that is option2? Or is there a way to make exonsBy faster by only extracting the transcripts I'm interested in?

My end goal is to get a GRangesList object with one GRanges object per transcript, OR a single GRanges object with duplicate entries for exons that appear in multiple transcripts (if that's possible). To begin with I only have the transcript names and the TxDb object.

_Edit_

I started working on an example to help explain what I want to accomplish. Below I'm trying two approaches, using exonsBy() (which succeeds but is slow) and using exons() (which fails but retrieves data faster):

library(Gviz)
library(GenomicFeatures)

options(ucscChromosomeNames=FALSE)
txdbhg19 <- GenomicFeatures::makeTxDbFromBiomart(
  biomart = "ENSEMBL_MART_ENSEMBL",
  dataset = "hsapiens_gene_ensembl",
  host = "dec2013.archive.ensembl.org")

transcriptNames <- c("ENST00000370032", "ENST00000420055")

# ----- APPROACH 1

# Get GrangesList
tst <- exonsBy(txdbhg19, by = "tx", use.names=TRUE)[transcriptNames]
# Unlist to get GRanges
tst <- unlist(tst)
# Add transcript metadata column to make Gviz group correctly
elementMetadata(tst)$transcript <- names(tst)
# Create track and plot
trTrack <- Gviz::GeneRegionTrack(tst)
Gviz::plotTracks(trTrack)
# This is the plot I want to produce

# ----- APPROACH 2

tst2 <- GenomicFeatures::exons(
  txdbhg19,
  vals = list(tx_name = transcriptNames),
  columns = list("EXONNAME", "TXNAME", "GENEID"))

# Problem 1: The TXNAME metadata field now contains transcripts that are not in my original transcript list:
unique(unlist(elementMetadata(tst2)$TXNAME[!elementMetadata(tst2)$TXNAME %in% transcriptNames]))
# [1] "ENST00000370031" "ENST00000402983"
# I can hack this out, but it feels unclean to do this:
elementMetadata(tst2)$TXNAME <- elementMetadata(tst2)$TXNAME[elementMetadata(tst2)$TXNAME %in% transcriptNames]
# I would much rather have it so that I only get the transcript names that I originally wanted. Anyhow, let's continue..

# Each entry in elementMetadata(tst2)$TXNAME now has 1 or more transcript names in it. This is not what I want. I want to somehow "expand" this GRanges object, so that each row only have one TXNAME value. Here's a very naive way to do it:

gr <- GRanges()
for (i in 1:length(tst2)) {
  trNames <- elementMetadata(tst2[i])$TXNAME[[1]]
  trNames <- strsplit(trNames, " ")
  for (j in 1:length(trNames)) {
    newGr <- tst2[i]
    elementMetadata(newGr)$TXNAME <- trNames[[j]]
    gr <- append(gr, newGr)
    print(paste(elementMetadata(tst2[i])$GENEID, trNames[[j]]))
  }
}
elementMetadata(gr)$transcript <- elementMetadata(gr)$TXNAME

trTrack <- Gviz::GeneRegionTrack(gr)
Gviz::plotTracks(gr)
# Error in (function (classes, fdef, mtable)  :
# unable to find an inherited method for function ‘displayPars<-’ for signature ‘"GRanges", "list"’

# Why do I get this error? I was hoping that this gr structure would be similar to the tst structure in approach 1

The goal is to get a structure similar to the one I get with exonsBy() -> unlist(). Any ideas?

GenomicFeatures • 3.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 3022 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6