I'm analysing RNA-Seq data to finally calculate RPKM value. First I did denovo assembly of reads using Trinity assembler. Then using Trinity output as reference for bowtie2, against same reads (used for denovo assembly) for mapping. But when I ran htseq, it gives zero read counts for some trinity contigs, since I used same reads for assembly and mapping, I shouldn't get zero read count.
here is a workflow:
Trinity -> bowtie2 -> htseq -> RPKM
Ideally I should not get zero read count for any of the trinity contigs, since all contigs were created from same reads. I don't understand where it went wrong.
Have you checked the bottom of the output table ? The number of reads counted as *ambiguous/not_aligned/...
Htseq is possibly more stringent than trinity and could not count mapped reads for several reasons listed here. This could explain why some contigs have zero counts.
__no_feature: reads (or read pairs) which could not be assigned to any feature;
__ambiguous: reads (or read pairs) which could have been assigned to more than one feature and hence were not counted for any of these.
__too_low_aQual: reads (or read pairs) which were skipped due to the -a option, see below
__not_aligned: reads (or read pairs) in the SAM file without alignment
__alignment_not_unique: reads (or read pairs) with more than one reported alignment.
Also, you could check your bam files with samtools idxstats if you could map reads on all contigs with bowtie.