Question: Trying to extract gene_type from GeneCode GTF file
0
gravatar for noa.rappaport
11 weeks ago by
noa.rappaport0 wrote:

Hi, I'm trying to filter my analysis to only specific gene types ( protein coding, link, ribosomal, rRNA). I'm starting from RNA seq counts. The GTF file is downloaded from: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/gencode.v24.annotation.gtf.gz Here is the code I have until now, and I'd like to add "gene_type" column to the output file:

getbiocGet("GenomicRanges")
biocGet("rtracklayer")
biocGet("genoset")

library(GenomicRanges)
library(rtracklayer)
library(genoset)

GTFfile = "./gencode.v24.annotation.gtf" 
GTF <- import.gff(con=GTFfile , format="gtf", genome="GRCh38.p5", feature.type="exon")

#Load the annotation and reduce it to get gene length
#GTF <- import.gff(GTFfile, format="gtf", genome="GRCm38.71", asRangedData=F, feature.type="exon")
grl <- reduce(split(GTF, elementMetadata(GTF)$gene_id))
reducedGTF <- unlist(grl, use.names=T)
elementMetadata(reducedGTF)$gene_id <- rep(names(grl), elementNROWS(grl))
elementMetadata(reducedGTF)$widths <- width(reducedGTF)
calc_length <- function(x) {
  sum(elementMetadata(x)$widths)
}
output <- sapply(split(reducedGTF, elementMetadata(reducedGTF)$gene_id), calc_length)
colnames(output) <- c("Length")

Thanks!

rna-seq R • 153 views
ADD COMMENTlink modified 9 weeks ago by i.sudbery6.9k • written 11 weeks ago by noa.rappaport0
0
gravatar for i.sudbery
9 weeks ago by
i.sudbery6.9k
Sheffield, UK
i.sudbery6.9k wrote:

I think you are making it hard for yourself by trying to process the GTF file in R. I would either download the table of biotypes from the ensembl website https://www.ensembl.org/biomart/martview and then load it into R.

or use the biomaRt Bioconductor package to fetch the table directly into R.

or if you REALLY wanted to process it from the GTF file, you can get it from the "gene_biiotype" column of the elementMetadata.

So the following code would fetch a table that had one column of gene_ids and another containing gene_biotype:

biotype_table <- elementMetadata(GTF)[ , c("gene_id", "gene_biotype")]
biotype_table <- biotype_table[!duplicated(biotype_table)]
ADD COMMENTlink written 9 weeks ago by i.sudbery6.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2262 users visited in the last hour