I would like to get CDS information about coding sequences from a TxDb object but am struggling to find a good solution. It seems as if the TxDb object can hold so much information, but there is no way to extract the information outside of converting to a database object; this is troublesome because it increases computational time over manipulating a GenomicRanges object.
library('GenomicFeatures') library(tidyverse) Gencode <- makeTxDbFromGFF("./gencode.v39.basic.annotation.gff3.gz") saveDb(Gencode, file="gencode.v39.basic.annotation.sqlite") Gencode <- loadDb("gencode.v39.basic.annotation.sqlite") #Get CDS by transcript: CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE)
In this command, it doesnt appear to be a way to name what information you want returned, you are stuck with:
seqnames ranges strand | cds_id cds_name exon_rank
Despite the fact that
Returns 22 different results......
I realize that I can use:
CDSbyTx <- cdsBy(TxDBObject, by="tx",use.names=TRUE) keys <- names(CDSbyTx) cols <- columns(Gencode) select(Gencode, keys = keys, columns = cols, keytype="TXNAME")
but this creates a data.frame where the CDS are not grouped by transcript, and this just seems really round about.
Any work around here?