Hello,
I'm using HTSeq-count to determine the number of transcripts for each sample. I have bam files,samtools flagstat analysis showed ~80% of the transcripts are aligned to the reference genome. Then I sorted those bam files by name and position, used the exon annotation gff3 file from sorghum phytozome v3.1 to determine the transcript counts using HTSeq-count. After analyzing my data, I realized no_feature was too high, I have a total of ~30 -36 million paired end reads and 15 million shows no_feature
.
code:
htseq-count --stranded=no --type=exon --idattr=Parent --order=pos --format=bam --minaqual=30 --mode=union $bam $gff3 >$out
__no_feature 15928275
__ambiguous 3020434
__too_low_aQual 768292
__not_aligned 5910873
__alignment_not_unique 1756250
Further, I looked at the data and found that all the featured were on 6 chromosomes and 4 (chr3,4,6 and 7) were completely absent in the HTSeq counts data. I looked at the gff3 file and found that exon data was present for all the chromosomes.
Can any please help me in fixing this issue?
Thanks in advance,
Sandeep
That's certainly the weirdest htseq-count issue I've heard of. Perhaps featureCounts will produce better results? It's much faster anyway.
Are the chromosome names exactly the same between the gff file and the bam file ? It's the only reason I can think of to explain this issue...