I am confused about the alignment stats I am getting and I really hope someone can explain them to me!
So I've used HISAT2 with default parameters using the grch38_tra index available. The results that HISAT2 is reporting back to me look fine to me. See below for an example, where I have an alignment rate of ~ 83 % :
5389593 (28.98%) aligned concordantly 0 times 11974983 (64.39%) aligned concordantly exactly 1 time 1233844 (6.63%) aligned concordantly >1 times ---- 5389593 pairs aligned concordantly 0 times; of these: 1021332 (18.95%) aligned discordantly 1 time ---- 4368261 pairs aligned 0 times concordantly or discordantly; of these: 8736522 mates make up the pairs; of these: 6676246 (76.42%) aligned 0 times 1714031 (19.62%) aligned exactly 1 time 346245 (3.96%) aligned >1 times
This makes sense to me but when I look at the qualimap results I am confused:
Number of mapped reads (left/right): 15,693,967 / 14,826,627 Number of aligned pairs (without duplicates): 13,208,827 Total number of alignments: 42,737,940 Number of secondary alignments: 12,217,346 Number of non-unique alignments: 15,018,799 Aligned to genes: 10,778,652 Ambiguous alignments: 1,313,140 No feature assigned: 15,611,447 Missing chromosome in annotation: 15,902 Not aligned: 6,676,246 Strand specificity estimation (fwd/rev): 0.03 / 0.97
So, what really threw me was the
Total number of alignments: 42,737,940
15,693,967 + 14,826,627 = 30,520,594 reads
this matches the HISAT2 results:
11974983*2 + 1233844*2 + 1021332*2 + 1714031 + 346245 = 30,520,594 reads
42,737,940 - 30,520,594 = 12,217,346 secondary alignments - this seems a lot and now I am worried something has gone wrong...
But HISAT2 says
1233844 (6.63%) aligned concordantly >1 times and
346245 (3.96%) aligned >1 times - this doesn't seem so bad.
How does this go together? Does this mean that a small number of reads map very often? As far as I know, HISAT2 allows a maximum of k=5 distinct alignments in default mode. Does it mean that most of the
1233844*2 + 346245 map around 5 times (and possibly more often if I would have allowed for a higher k)?
Is this how the
Number of secondary alignments and the
Number of non-unique alignments relate to each other?
Number of non-unique alignments would then be secondary alignments plus the number of multi mappers set as primary:
1233844*2 + 346245 + 12,217,346 which is close to
Number of non-unique alignments: 15,018,799
Is this something to worry about? I see this with most of my samples. What do you use as cutoff/threshold for multi mappings as a quality control for your sample? Thanks for your input!