Question

FeatureCounts (Ensembl based GTF v.s In-built Entrez GTF) variances in count data

0

Entering edit mode

4.8 years ago

harrydolan.dc ▴ 20

Hi everyone

Firstly thank you in advance for any help you can give, I am new to bioinformatics and biostars has been immensely helpful. I have human RNA-seq data that I am currently processing, I've gone through my trimming and aligning (with STAR) stages and have just used featureCounts to counts in my data.

I have tried two different methods for featureCounts both worked but varied in their count data. Firstly I used the HG38 GTF from ensembl and secondly I used the built in HG38 GTF from the RSubread package, (entrez gene)...

Both were successful but I compared corresponding genes between ensembl and entrez gene and the count data was quite different - Total number of reads also differed from 36165730 to 38752850 respectively.

Why would the total number of counts be higher in the case of entrez gene? - Seems strange considering ensemble is larger in scope.

I understand that ensembl and entrez do not completely align but the differences seemed quite dramatic, Is this normal? and if I use the entrez values is this okay considering I aligned my data using an ensmbl GTF.

featureCounts RNA-Seq sequencing • 1.7k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 4.8 years ago by harrydolan.dc ▴ 20

1

Entering edit mode

I think you might find the following manuscript helpful: A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification

ADD REPLY • link 4.8 years ago by newbio17 ▴ 370