Entering edit mode
23 months ago
Nemo
•
0
I have aligned my RNA sequences against human genome GRCh38.p13. Then I am using htseq-count to count the reads per gene. The below is the command I am using:
htseq-count ./sorted-bams/f.bam ./gencode.v41.chr_patch_hapl_scaff.annotation.gff3.gz > ./htseq/f.txt
In the output file, I got :
__no_feature 2280226
__ambiguous 244
__too_low_aQual 3761161
__not_aligned 34990259
__alignment_not_unique 0
Are these numbers reasonable?
How are distributed your reads in term of length? And their quality? Which software did you used for aligning them? Which parameters where used for the alignment? With no details there's no margin to investigate for reasons.
Thanks Shred for your response. Regarding software, I am using GATK DRAGEN Map for alignment. The command I am using does not have any specific parameter as follows:
Im not sure how can I get the other information you asked for. samtools view maybe?
I'm not sure how Dragen handles the alignment file and if there's any incompatibilities with quantification softwares like htseq-count. Why not following the Illumina protocol also for the quantification?
Before doing any kind of analysis you need to check the quality of the sequencing file. Run FASTQC against them to see the read length distribution, sequencing qualities and other features.