Question: The rate of "aligned concordantly exactly 1 time" in Hisat2
0
gravatar for f86222
5 months ago by
f862220
National Taiwan Normal University, Taiwan
f862220 wrote:

Hi all,

I am processing and analysis of RNAseq data at the very beginning. I trim the RNAseq data with Trimmomatic and aligning the data to a reference genome using Hisat2. I trimmed the data with two different settings:

One with removes 10 bases from the beginning of the read (HEADCROP:10), and the other without this setting.

However, I found that there is a huge different rate of alignment for paired-end reads when I trim data differently, for example:

with HEADCROP:

43823976 reads; of these:
  43823976 (100.00%) were paired; of these:
    20376984 (46.50%) aligned concordantly 0 times
    22619436 (51.61%) aligned concordantly exactly 1 time
    827556 (1.89%) aligned concordantly >1 times
    ----
    20376984 pairs aligned concordantly 0 times; of these:
      9403592 (46.15%) aligned discordantly 1 time
    ----
    10973392 pairs aligned 0 times concordantly or discordantly; of these:
      21946784 mates make up the pairs; of these:
        12208010 (55.63%) aligned 0 times
        8593679 (39.16%) aligned exactly 1 time
        1145095 (5.22%) aligned >1 times
86.07% overall alignment rate
  

without HEADCROP:

43953809 reads; of these:
  43953809 (100.00%) were paired; of these:
    7868205 (17.90%) aligned concordantly 0 times
    34738253 (79.03%) aligned concordantly exactly 1 time
    1347351 (3.07%) aligned concordantly >1 times
    ----
    7868205 pairs aligned concordantly 0 times; of these:
      342967 (4.36%) aligned discordantly 1 time
    ----
    7525238 pairs aligned 0 times concordantly or discordantly; of these:
      15050476 mates make up the pairs; of these:
        12437657 (82.64%) aligned 0 times
        2487014 (16.52%) aligned exactly 1 time
        125805 (0.84%) aligned >1 times
85.85% overall alignment rate
  

Why the aligned concordantly 0 times and exactly 1 time will be so different?

Will the low rate of "aligned concordantly exactly 1 time" be the problem that may influence the follow-up analysis (e.g. counting gene in FeatureCounts)?

I will appreciate your help with this situation. Thank you very much for your time.

Best,

Yi-Ting Fang

rna-seq alignment • 198 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by f862220
2

Why did you remove 10 nt from the start of the read? This was clearly unnecessary as HISAT2 will soft-clip alignments when needed - unless otherwise specified. Your alignment rate is very good in the first place. It could be that you trimmed your reads too short and now they multi-map.

ADD REPLYlink written 5 months ago by benformatics1.6k

Because I found the percentage of ATCG change drastically at the beginning of the read, and I expect that remove 10 nt from start bases might improve the accuracy of mapping. Based on the principle of HISAT2, does it be redundant or even be bad to do this? Thank you.

ADD REPLYlink modified 5 months ago • written 5 months ago by f862220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1439 users visited in the last hour