How to choose output when using a "By" function from a TxDb object; GenomicFeatures R
1
0
Entering edit mode
5 months ago
jon.klonowski ▴ 120

I would like to get CDS information about coding sequences from a TxDb object but am struggling to find a good solution. It seems as if the TxDb object can hold so much information, but there is no way to extract the information outside of converting to a database object; this is troublesome because it increases computational time over manipulating a GenomicRanges object.

library('GenomicFeatures')
library(tidyverse)

Gencode <- makeTxDbFromGFF("./gencode.v39.basic.annotation.gff3.gz")
saveDb(Gencode, file="gencode.v39.basic.annotation.sqlite")

#Get CDS by transcript:
CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)


In this command, it doesnt appear to be a way to name what information you want returned, you are stuck with:

seqnames      ranges strand |    cds_id          cds_name exon_rank


Despite the fact that

> columns(Gencode)


Returns 22 different results......

I realize that I can use:

CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)
keys <- names(CDSbyTx)
cols <- columns(Gencode)
select(Gencode, keys = keys, columns = cols, keytype="TXNAME")


but this creates a data.frame where the CDS are not grouped by transcript, and this just seems really round about.

Any work around here?

TxDb GenomicFeatures genomics R • 775 views
0
Entering edit mode

Are you just trying to get a GRanges object with all CDSs and additional info like tx_id and such?

0
Entering edit mode

jon.klonowski Do not delete posts that have received feedback. Instead, interact with the people investing effort in your problem and if the problem resolved itself or their feedback helped, let everyone know.

0
Entering edit mode

It was a dumb question. Everything I needed was there I just got a little scatter brained @Ram

1
Entering edit mode

That's OK, it happens to everyone. Just leave a comment saying what now-so-obvious thing you missed and I guarantee you, someone else will miss the exact same thing and find that your comment saves them at least a few hours.

2
Entering edit mode
5 months ago
jon.klonowski ▴ 120

cdsBy() and exonBy() return similar output, with one being exon specific and the other exon specific. Other than the titles of the columns, the only real difference is that the ranges returned by cdsBy() is the coding sequence while exonBy() returns the exon sequence. So difficult to understand, wow.

I honestly dont know what I was thinking with this question.