How to choose output when using a "By" function from a TxDb object; GenomicFeatures R
1
0
Entering edit mode
2.2 years ago
jon.klonowski ▴ 150

I would like to get CDS information about coding sequences from a TxDb object but am struggling to find a good solution. It seems as if the TxDb object can hold so much information, but there is no way to extract the information outside of converting to a database object; this is troublesome because it increases computational time over manipulating a GenomicRanges object.

library('GenomicFeatures')
library(tidyverse)

Gencode <- makeTxDbFromGFF("./gencode.v39.basic.annotation.gff3.gz")
saveDb(Gencode, file="gencode.v39.basic.annotation.sqlite")
Gencode <- loadDb("gencode.v39.basic.annotation.sqlite")

#Get CDS by transcript: 
CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)

In this command, it doesnt appear to be a way to name what information you want returned, you are stuck with:

seqnames      ranges strand |    cds_id          cds_name exon_rank

Despite the fact that

> columns(Gencode) 

Returns 22 different results......

I realize that I can use:

CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)
keys <- names(CDSbyTx)
cols <- columns(Gencode) 
select(Gencode, keys = keys, columns = cols, keytype="TXNAME")

but this creates a data.frame where the CDS are not grouped by transcript, and this just seems really round about.

Any work around here?

TxDb GenomicFeatures genomics R • 1.4k views
ADD COMMENT
0
Entering edit mode

Are you just trying to get a GRanges object with all CDSs and additional info like tx_id and such?

ADD REPLY
0
Entering edit mode

jon.klonowski Do not delete posts that have received feedback. Instead, interact with the people investing effort in your problem and if the problem resolved itself or their feedback helped, let everyone know.

ADD REPLY
0
Entering edit mode

It was a dumb question. Everything I needed was there I just got a little scatter brained @Ram

ADD REPLY
1
Entering edit mode

That's OK, it happens to everyone. Just leave a comment saying what now-so-obvious thing you missed and I guarantee you, someone else will miss the exact same thing and find that your comment saves them at least a few hours.

ADD REPLY
2
Entering edit mode
2.2 years ago
jon.klonowski ▴ 150

cdsBy() and exonBy() return similar output, with one being exon specific and the other exon specific. Other than the titles of the columns, the only real difference is that the ranges returned by cdsBy() is the coding sequence while exonBy() returns the exon sequence. So difficult to understand, wow.

I honestly dont know what I was thinking with this question.

ADD COMMENT

Login before adding your answer.

Traffic: 2239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6