I'm trying to extract raw count data from a .bam file (output by Tophat2) for use in DESeq2. I'm trying to use HTSeq. Although I can get HTSeq to run, nothing aligns uniquely. The reads themselves are 100 bp long, so I'm surprised that's the case.
I input something like:
htseq-count -f bam -r pos /path/on/cluster/mySampleBam/accepted_hits.bam /path/on/cluster/homoSapGRCh38.gtf
And the output file looks like:
... ENSG00000282813 0 ENSG00000282814 0 ENSG00000282815 0 ENSG00000282816 0 ENSG00000282817 0 ENSG00000282818 0 ENSG00000282819 0 ENSG00000282820 0 ENSG00000282821 0 ENSG00000282822 0 __no_feature 16996078 __ambiguous 0 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 2579821
Has anyone had this issue? What might be going on?
I previously sorted the .bam file I'm trying to generate counts for with the command:
samtools sort accepted_hits.bam accepted_hits_sorted