I'm using HTSeq-count to determine the number of transcripts for each sample. I have bam files,samtools flagstat analysis showed ~80% of the transcripts are aligned to the reference genome. Then I sorted those bam files by name and position, used the exon annotation gff3 file from sorghum phytozome v3.1 to determine the transcript counts using HTSeq-count. After analyzing my data, I realized no_feature was too high, I have a total of ~30 -36 million paired end reads and 15 million shows no_feature.
code: htseq-count --stranded=no --type=exon --idattr=Parent --order=pos --format=bam --minaqual=30 --mode=union $bam $gff3 >$out
Further, I looked at the data and found that all the featured were on 6 chromosomes and 4 (chr3,4,6 and 7) were completely absent in the HTSeq counts data. I looked at the gff3 file and found that exon data was present for all the chromosomes.
Can any please help me in fixing this issue?
Thanks in advance,