Htseq-count output file having a high number of __not_aligned
0
0
Entering edit mode
6 months ago
Nemo • 0

I have aligned my RNA sequences against human genome GRCh38.p13. Then I am using htseq-count to count the reads per gene. The below is the command I am using:

htseq-count ./sorted-bams/f.bam ./gencode.v41.chr_patch_hapl_scaff.annotation.gff3.gz >  ./htseq/f.txt


In the output file, I got :

__no_feature    2280226
__ambiguous 244
__too_low_aQual 3761161
__not_aligned   34990259
__alignment_not_unique  0


Are these numbers reasonable?

read htseq-count human counts genome alignment • 357 views
0
Entering edit mode

How are distributed your reads in term of length? And their quality? Which software did you used for aligning them? Which parameters where used for the alignment? With no details there's no margin to investigate for reasons.

0
Entering edit mode

Thanks Shred for your response. Regarding software, I am using GATK DRAGEN Map for alignment. The command I am using does not have any specific parameter as follows:

dragen-os -r /human -1 R1 -2 R2 > /samFiles/sample.sam


Im not sure how can I get the other information you asked for. samtools view maybe?

0
Entering edit mode

I'm not sure how Dragen handles the alignment file and if there's any incompatibilities with quantification softwares like htseq-count. Why not following the Illumina protocol also for the quantification?

Before doing any kind of analysis you need to check the quality of the sequencing file. Run FASTQC against them to see the read length distribution, sequencing qualities and other features.