Question: Confusing RNA-seq Alignment Stats (HISAT2 & Qualimap)
1
gravatar for JJ
15 months ago by
JJ520
JJ520 wrote:

I am confused about the alignment stats I am getting and I really hope someone can explain them to me!

So I've used HISAT2 with default parameters using the grch38_tra index available. The results that HISAT2 is reporting back to me look fine to me. See below for an example, where I have an alignment rate of ~ 83 % :

    5389593 (28.98%) aligned concordantly 0 times
    11974983 (64.39%) aligned concordantly exactly 1 time
    1233844 (6.63%) aligned concordantly >1 times
    ----
    5389593 pairs aligned concordantly 0 times; of these:
      1021332 (18.95%) aligned discordantly 1 time
    ----
    4368261 pairs aligned 0 times concordantly or discordantly; of these:
      8736522 mates make up the pairs; of these:
        6676246 (76.42%) aligned 0 times
        1714031 (19.62%) aligned exactly 1 time
        346245 (3.96%) aligned >1 times

This makes sense to me but when I look at the qualimap results I am confused:

Number of mapped reads (left/right): 15,693,967 / 14,826,627
Number of aligned pairs (without duplicates): 13,208,827
Total number of alignments: 42,737,940
Number of secondary alignments: 12,217,346
Number of non-unique alignments: 15,018,799
Aligned to genes: 10,778,652
Ambiguous alignments: 1,313,140
No feature assigned: 15,611,447
Missing chromosome in annotation: 15,902
Not aligned: 6,676,246
Strand specificity estimation (fwd/rev):  0.03 / 0.97

So, what really threw me was the Total number of alignments: 42,737,940

15,693,967 + 14,826,627 = 30,520,594 reads

this matches the HISAT2 results: 11974983*2 + 1233844*2 + 1021332*2 + 1714031 + 346245 = 30,520,594 reads

42,737,940 - 30,520,594 = 12,217,346 secondary alignments - this seems a lot and now I am worried something has gone wrong...

But HISAT2 says 1233844 (6.63%) aligned concordantly >1 times and 346245 (3.96%) aligned >1 times - this doesn't seem so bad.

How does this go together? Does this mean that a small number of reads map very often? As far as I know, HISAT2 allows a maximum of k=5 distinct alignments in default mode. Does it mean that most of the 1233844*2 + 346245 map around 5 times (and possibly more often if I would have allowed for a higher k)?

Is this how the Number of secondary alignments and the Number of non-unique alignments relate to each other? Number of non-unique alignments would then be secondary alignments plus the number of multi mappers set as primary: 1233844*2 + 346245 + 12,217,346 which is close to Number of non-unique alignments: 15,018,799

Is this something to worry about? I see this with most of my samples. What do you use as cutoff/threshold for multi mappings as a quality control for your sample? Thanks for your input!

rna-seq • 554 views
ADD COMMENTlink modified 15 months ago by yztxwd380 • written 15 months ago by JJ520
1

Explanation about HISAT stats could be found here

A: Evaluation of HISAT2 Alignment Result

ADD REPLYlink written 15 months ago by lakhujanivijay5.3k

Thanks for the link - I understand the HISAT2 results - my question was more regarding the Qualimap results and if the Total number of alignments / Number of secondary alignments is too high. Having said that I also get similar results for human Encode samples.

ADD REPLYlink modified 15 months ago • written 15 months ago by JJ520

How long are your reads?

ADD REPLYlink written 15 months ago by shunyip200

The reads are 100bp long and paired-end

ADD REPLYlink written 15 months ago by JJ520
2
gravatar for yztxwd
15 months ago by
yztxwd380
Southern Medical University
yztxwd380 wrote:

I don't see any big problem with your mapping result. It is quite normal to see some parts of reads mapped to multiple places on the genome.

The cutoff/threshold is very hard to determine (for me) because it largely depends on what type of experiments you have done. For example, if some genes are located in regions with very low sequence complexity, it will be very easy to get some reads mapped to multiple regions. Also, short reads are easily mapped to multiple places, I think that's why @shunyip ask you the length of your reads. Moreover, different genomes, mutations, cell lines all have a great impact on what kind of reads you got.

So I think you don't need to worry about your mapping results. Maybe focusing on other QC results of Qualimap is better.

ADD COMMENTlink modified 15 months ago • written 15 months ago by yztxwd380

Thanks for your input. I just wanted to make sure as the Total number of alignments / Number of secondary alignments seems very high but the % of multi mappers appears to be ok for me.

ADD REPLYlink modified 15 months ago • written 15 months ago by JJ520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour
_