Question: htseq read counts
gravatar for gudraephouto
3.7 years ago by
gudraephouto10 wrote:

I am very new with analysis of RNA-seq. I have sent some fastq files (homo sapiens) from EBI to Galaxy. In order to obtain read counts, I aligned them to hg19 using HiSat (default parameters). Then since my reference genome was hg19, I used GTF file (Version 19 (July 2013 freeze, GRCh37) - Ensembl 74, 75) from Gencode to obtain read counts using htseq.

The total number of counts obtained for features is "10347508" which seems to be ok. While I have lost a number of counts about

__no_feature 2362227
__ambiguous 788874
__too_low_aQual 1001993
__not_aligned 2517255
__alignment_not_unique 3866370

Do you think the result is reasonable?

Something confusing is that from total 57820 genes, the counts for each gene up to gene 18356 are mostly non-zero, but counts for each gene from gene 18356 to gene 57820 are mostly zero (a few of them are non-zero).

Why is that?

Do you think I have to change my GTF file? Which version?

Or do you think I have to consider only the first 18356 genes for DE analysis ?


gtf htseq • 1.5k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 3.7 years ago by gudraephouto10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1317 users visited in the last hour