Htseq-count output file having a high number of __not_aligned
0
0
Entering edit mode
19 months ago
Nemo • 0

I have aligned my RNA sequences against human genome GRCh38.p13. Then I am using htseq-count to count the reads per gene. The below is the command I am using:

htseq-count ./sorted-bams/f.bam ./gencode.v41.chr_patch_hapl_scaff.annotation.gff3.gz >  ./htseq/f.txt

In the output file, I got :

__no_feature    2280226
__ambiguous 244
__too_low_aQual 3761161
__not_aligned   34990259
__alignment_not_unique  0

Are these numbers reasonable?

read htseq-count human counts genome alignment • 722 views
ADD COMMENT
0
Entering edit mode

How are distributed your reads in term of length? And their quality? Which software did you used for aligning them? Which parameters where used for the alignment? With no details there's no margin to investigate for reasons.

ADD REPLY
0
Entering edit mode

Thanks Shred for your response. Regarding software, I am using GATK DRAGEN Map for alignment. The command I am using does not have any specific parameter as follows:

dragen-os -r /human -1 R1 -2 R2 > /samFiles/sample.sam

Im not sure how can I get the other information you asked for. samtools view maybe?

ADD REPLY
0
Entering edit mode

I'm not sure how Dragen handles the alignment file and if there's any incompatibilities with quantification softwares like htseq-count. Why not following the Illumina protocol also for the quantification?

Before doing any kind of analysis you need to check the quality of the sequencing file. Run FASTQC against them to see the read length distribution, sequencing qualities and other features.

ADD REPLY

Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6