Question

The rate of "aligned concordantly exactly 1 time" in Hisat2

0

Entering edit mode

4.2 years ago

f86222 • 0

Hi all,

I am processing and analysis of RNAseq data at the very beginning. I trim the RNAseq data with Trimmomatic and aligning the data to a reference genome using Hisat2. I trimmed the data with two different settings:

One with removes 10 bases from the beginning of the read (HEADCROP:10), and the other without this setting.

However, I found that there is a huge different rate of alignment for paired-end reads when I trim data differently, for example:

with HEADCROP:

43823976 reads; of these:
  43823976 (100.00%) were paired; of these:
    20376984 (46.50%) aligned concordantly 0 times
    22619436 (51.61%) aligned concordantly exactly 1 time
    827556 (1.89%) aligned concordantly >1 times
    ----
    20376984 pairs aligned concordantly 0 times; of these:
      9403592 (46.15%) aligned discordantly 1 time
    ----
    10973392 pairs aligned 0 times concordantly or discordantly; of these:
      21946784 mates make up the pairs; of these:
        12208010 (55.63%) aligned 0 times
        8593679 (39.16%) aligned exactly 1 time
        1145095 (5.22%) aligned >1 times
86.07% overall alignment rate

without HEADCROP:

43953809 reads; of these:
  43953809 (100.00%) were paired; of these:
    7868205 (17.90%) aligned concordantly 0 times
    34738253 (79.03%) aligned concordantly exactly 1 time
    1347351 (3.07%) aligned concordantly >1 times
    ----
    7868205 pairs aligned concordantly 0 times; of these:
      342967 (4.36%) aligned discordantly 1 time
    ----
    7525238 pairs aligned 0 times concordantly or discordantly; of these:
      15050476 mates make up the pairs; of these:
        12437657 (82.64%) aligned 0 times
        2487014 (16.52%) aligned exactly 1 time
        125805 (0.84%) aligned >1 times
85.85% overall alignment rate

Why the aligned concordantly 0 times and exactly 1 time will be so different?

Will the low rate of "aligned concordantly exactly 1 time" be the problem that may influence the follow-up analysis (e.g. counting gene in FeatureCounts)?

I will appreciate your help with this situation. Thank you very much for your time.

Best,

Yi-Ting Fang

RNA-Seq alignment • 1.9k views

ADD COMMENT • link 4.2 years ago by f86222 • 0

2

Entering edit mode

Why did you remove 10 nt from the start of the read? This was clearly unnecessary as HISAT2 will soft-clip alignments when needed - unless otherwise specified. Your alignment rate is very good in the first place. It could be that you trimmed your reads too short and now they multi-map.

ADD REPLY • link 4.2 years ago by benformatics 3.9k

0

Entering edit mode

Because I found the percentage of ATCG change drastically at the beginning of the read, and I expect that remove 10 nt from start bases might improve the accuracy of mapping. Based on the principle of HISAT2, does it be redundant or even be bad to do this? Thank you.

ADD REPLY • link 4.2 years ago by f86222 • 0