htseq read counts
0
0
Entering edit mode
8.8 years ago
gudraephouto ▴ 10

I am very new with analysis of RNA-seq. I have sent some fastq files (homo sapiens) from EBI to Galaxy. In order to obtain read counts, I aligned them to hg19 using HiSat (default parameters). Then since my reference genome was hg19, I used GTF file (Version 19 (July 2013 freeze, GRCh37) - Ensembl 74, 75) from Gencode to obtain read counts using htseq.

The total number of counts obtained for features is "10347508" which seems to be ok. While I have lost a number of counts about

__no_feature 2362227
__ambiguous 788874
__too_low_aQual 1001993
__not_aligned 2517255
__alignment_not_unique 3866370

Do you think the result is reasonable?

Something confusing is that from total 57820 genes, the counts for each gene up to gene 18356 are mostly non-zero, but counts for each gene from gene 18356 to gene 57820 are mostly zero (a few of them are non-zero).

Why is that?

Do you think I have to change my GTF file? Which version?

Or do you think I have to consider only the first 18356 genes for DE analysis ?

Thanks

htseq GTF • 2.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2302 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6