Different results from featureCounts using gene_id and gene_biotype
1
0
Entering edit mode
24 months ago

Dear Community,

I have been running featureCounts to count mapped reads in RNA-Seq paired-end data with the following parameters

featureCounts -T 10 -t exon -s 2 -p -g gene_id -a <path_to_annotation.gtf> -o <path_to_outoput_directory> mapped.bam

I also desired to find the biotype associated so I ran featureCounts using the following command

featureCounts -T 10 -t exon -s 2 -p -g gene_biotype -a <path_to_annotation.gtf> -o <path_to_outoput_directory> mapped.bam

comparing the MultiQC from the output of both commands resulted in contradictory output, with less number of reads assigned using Gene_Id in comparison to Gene_biotype.

Gene_Biotype Assignment by number of Reads

Gene_Biotype Assignment by Percentage

Gene_Id Assignment by number of Reads

Gene_Id Assignment by Percentage

My assumption was that in both cases the number of reads assigned will be same and then I can deduce the gene biotype for those assigned reads.

what am I missing here?

thanks in advance.

RNA-Seq featureCounts subread readcounts biotype • 1.1k views
ADD COMMENT
0
Entering edit mode
24 months ago

My guess is that featureCounts does not count reads if they could be assigned to two different features. Where you have two genes overlapping it will not count those reads. Imagine two overlapping non-coding genes. When counting by gene_id, reads in the overlap would not be counted. However, with counting by biotype, they would, because even though they map to two transcripts, those transcripts both have the same biotype.

ADD COMMENT
0
Entering edit mode

makes sense. is there a way to work around this situation using featureCounts?

the only way I can think of is using the following code

library(Biostrings)
library(biomaRt)
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
my.genes <- rownames(GeneCounts)

out <- getBM(attributes=c("ensembl_gene_id", "gene_biotype"), 
             filters="ensembl_gene_id", values=my.genes, mart=ensembl)
out <- out[match(my.genes, out$ensembl_gene_id),]
out <- na.omit(out) # removes genes which don't match any biotype
ADD REPLY
0
Entering edit mode

Yes. This, or something like it is what I would recommend.

ADD REPLY

Login before adding your answer.

Traffic: 2743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6