Question

How to choose output when using a "By" function from a TxDb object; GenomicFeatures R

0

Entering edit mode

2.3 years ago

jon.klonowski ▴ 150

I would like to get CDS information about coding sequences from a TxDb object but am struggling to find a good solution. It seems as if the TxDb object can hold so much information, but there is no way to extract the information outside of converting to a database object; this is troublesome because it increases computational time over manipulating a GenomicRanges object.

library('GenomicFeatures')
library(tidyverse)

Gencode <- makeTxDbFromGFF("./gencode.v39.basic.annotation.gff3.gz")
saveDb(Gencode, file="gencode.v39.basic.annotation.sqlite")
Gencode <- loadDb("gencode.v39.basic.annotation.sqlite")

#Get CDS by transcript: 
CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)

In this command, it doesnt appear to be a way to name what information you want returned, you are stuck with:

seqnames      ranges strand |    cds_id          cds_name exon_rank

Despite the fact that

> columns(Gencode)

Returns 22 different results......

I realize that I can use:

CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)
keys <- names(CDSbyTx)
cols <- columns(Gencode) 
select(Gencode, keys = keys, columns = cols, keytype="TXNAME")

but this creates a data.frame where the CDS are not grouped by transcript, and this just seems really round about.

Any work around here?

TxDb GenomicFeatures genomics R • 1.4k views

ADD COMMENT • link 2.2 years ago by jon.klonowski ▴ 150

0

Entering edit mode

Are you just trying to get a GRanges object with all CDSs and additional info like tx_id and such?

ADD REPLY • link 2.3 years ago by rpolicastro 13k

0

Entering edit mode

jon.klonowski Do not delete posts that have received feedback. Instead, interact with the people investing effort in your problem and if the problem resolved itself or their feedback helped, let everyone know.

ADD REPLY • link 2.3 years ago by Ram 43k

0

Entering edit mode

It was a dumb question. Everything I needed was there I just got a little scatter brained @Ram

ADD REPLY • link 2.3 years ago by jon.klonowski ▴ 150

1

Entering edit mode

That's OK, it happens to everyone. Just leave a comment saying what now-so-obvious thing you missed and I guarantee you, someone else will miss the exact same thing and find that your comment saves them at least a few hours.

ADD REPLY • link 2.3 years ago by Ram 43k

score 2 · Accepted Answer · 2022-01-25

cdsBy() and exonBy() return similar output, with one being exon specific and the other exon specific. Other than the titles of the columns, the only real difference is that the ranges returned by cdsBy() is the coding sequence while exonBy() returns the exon sequence. So difficult to understand, wow.

I honestly dont know what I was thinking with this question.