Question

Different results from featureCounts using gene_id and gene_biotype

0

Entering edit mode

4.4 years ago

CuriusScientist ▴ 50

Dear Community,

I have been running featureCounts to count mapped reads in RNA-Seq paired-end data with the following parameters

featureCounts -T 10 -t exon -s 2 -p -g gene_id -a <path_to_annotation.gtf> -o <path_to_outoput_directory> mapped.bam

I also desired to find the biotype associated so I ran featureCounts using the following command

featureCounts -T 10 -t exon -s 2 -p -g gene_biotype -a <path_to_annotation.gtf> -o <path_to_outoput_directory> mapped.bam

comparing the MultiQC from the output of both commands resulted in contradictory output, with less number of reads assigned using Gene_Id in comparison to Gene_biotype.

Gene_Biotype Assignment by number of Reads

Gene_Biotype Assignment by Percentage

Gene_Id Assignment by number of Reads

Gene_Id Assignment by Percentage

My assumption was that in both cases the number of reads assigned will be same and then I can deduce the gene biotype for those assigned reads.

what am I missing here?

thanks in advance.

RNA-Seq featureCounts subread readcounts biotype • 2.8k views

ADD COMMENT • link updated 4.4 years ago by i.sudbery 19k • written 4.4 years ago by CuriusScientist ▴ 50

i.sudbery · Answer 1 · 2019-12-11

0

Entering edit mode

4.4 years ago

i.sudbery 19k

My guess is that featureCounts does not count reads if they could be assigned to two different features. Where you have two genes overlapping it will not count those reads. Imagine two overlapping non-coding genes. When counting by gene_id, reads in the overlap would not be counted. However, with counting by biotype, they would, because even though they map to two transcripts, those transcripts both have the same biotype.

ADD COMMENT • link 4.4 years ago by i.sudbery 19k

0

Entering edit mode

makes sense. is there a way to work around this situation using featureCounts?

the only way I can think of is using the following code

library(Biostrings)
library(biomaRt)
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
my.genes <- rownames(GeneCounts)

out <- getBM(attributes=c("ensembl_gene_id", "gene_biotype"), 
             filters="ensembl_gene_id", values=my.genes, mart=ensembl)
out <- out[match(my.genes, out$ensembl_gene_id),]
out <- na.omit(out) # removes genes which don't match any biotype

ADD REPLY • link updated 4.4 years ago by i.sudbery 19k • written 4.4 years ago by CuriusScientist ▴ 50

0

Entering edit mode

Yes. This, or something like it is what I would recommend.

ADD REPLY • link 4.4 years ago by i.sudbery 19k