Full Disclosure: I am really new to RNA sequencing.
I am using bowtie, tophat, and htseq to build a counts matrix of reads for my samples. I am using the "chromosomes" file to build my reference genome from CGD. Everything seems to be going well.
My understanding is that there are 6620 total features for haploids. My data set is diploid, which should give me 13,280 total features. However, when I look at my resulting counts matrix, I have approx 12,800 rows. Shouldn't I expect 13,280 rows because each row corresponds to a feature?
Numbers are from: http://www.candidagenome.org/cache/C_albicans_SC5314_genomeSnapshot.html