3.1 years ago
Hi,

I'm using hisat2 for aligning reads to the genome. For a few samples I see some differences by using hisat2 and bamqc from qualimap.

Hisat2 output:

37317546 reads; of these:
37317546 (100.00%) were paired; of these:
14771091 (39.58%) aligned concordantly 0 times
7081700 (18.98%) aligned concordantly exactly 1 time
15464755 (41.44%) aligned concordantly >1 times
----
14771091 pairs aligned concordantly 0 times; of these:
1186424 (8.03%) aligned discordantly 1 time
----
13584667 pairs aligned 0 times concordantly or discordantly; of these:
27169334 mates make up the pairs; of these:
22785681 (83.87%) aligned 0 times
1973892 (7.27%) aligned exactly 1 time
2409761 (8.87%) aligned >1 times
69.47% overall alignment rate


For the same sample using bam file "qualimap bamqc results" are as following:

Reference

number of bases = 3,099,750,718 bp
number of contigs = 194

Globals

number of windows = 593

number of mapped reads = 179,886,195 (88.76%)

number of mapped paired reads (first in pair) = 90,666,939
number of mapped paired reads (second in pair) = 89,219,256
number of mapped paired reads (both in pair) = 171,622,685
number of mapped paired reads (singletons) = 8,263,510
number of mapped bases = 30,000,606,541 bp
number of sequenced bases = 8,238,989,876 bp
number of aligned bases = 0 bp
number of duplicated reads (estimated) = 95,761,476
duplication rate = 25.6%

Insert size

mean insert size = 29,714.41
std insert size = 464,081.65
median insert size = 1199

Mapping quality

mean mapping quality = 13.82

ACTG content

number of A's = 1,679,640,440 bp (20.39%)
number of C's = 2,133,982,067 bp (25.9%)
number of T's = 1,805,802,126 bp (21.92%)
number of G's = 2,619,565,243 bp (31.79%)
number of N's = 0 bp (0%)

GC percentage = 57.7%

Mismatches and indels

general error rate = 0
number of mismatches = 32,158,659
number of insertions = 876,201
mapped reads with insertion percentage = 0.49%
number of deletions = 174,885
mapped reads with deletion percentage = 0.1%
homopolymer indels = 24.62%


In hisat2 output I see overall alignment rate is 69.47% and bamqc results I see number of mapped reads is 88%. Which is right one?

Both metrics are right. In your bam file, you have 88% of mapped reads. From your input reads, only 69% are mapped (once or more than once). The mutlitple alignments are causing the difference

Ok. And how can I get unmapped reads percentage? 88% of mapped reads is once?

The percentage of unmapped reads (compared to the total number of reads) is 30.53%.

The percentage of unmapped reads (compared to the total number of alignments in the bam file) is 12%.

Thank you. But could you please tell me how this total number of reads and total number of alignments are different? And could you also tell me how u calculated the above percentages.

But could you please tell me how this total number of reads and total number of alignments are different?

Because for one read, there can be more than one alignment : (41.44%) aligned concordantly >1 times

And could you also tell me how u calculated the above percentages.

100% - 69.47% = 30.53% (1 - (number of reads that map at least once/total number of reads) = proportion of unmapped reads)

100% - 88% = 12% (1 - (number of effective alignments/total number of entries in the bam file) = proportion of unmapped reads in the bam file)

Thank you very much. I guess there is a typo in ur comment. It should be 100% - 88% = 12%.