Question

Why HISAT2 gave low concordantly aligned pair rates for some samples while high for others with same options?

0

Entering edit mode

9.0 years ago

Peng Huang ▴ 50

Hi, everyone!

Recently, I’m analyzing a transcriptomic dataset compromised of 66 RNA-seq samples (pair-end, 150bp, average depth 60M reads). After adapter cutting and low quality reads trimming by using Cutadapt and Trimmomatic respectively, read alignment was conducted by using the newest version of HISAT2 with default options. There is an example:

hisat2 -p 2 --dta -x /data/huangp/HCC_RNA-seq/genome_snp_tran -q -1 data/huangp/HCC/WGC066460R_paired_1.fastq -2 /data/huangp/HCC/WGC066460R_paired_2.fastq -S data/huangp/HCC/WGC066460R_discordant_enable.sam >WGC066460R_summary_discordant_enable.txt &*

And the alignment summary metrics indicated that all 66 samples had very high overall mapping rate (average > 97.4%) but 20 of them showed lower concordantly aligned pair rate (average 71.78% vs average 89.43%) than other 40 samples (there were two alignment summary metrics of two sets of samples below).

[huangp@localhost HCC]$ tail -n 15 ./WGC066520R_summary_discordant_enable.txt

74127014 reads; of these: 74127014 (100.00%) were paired; of these:

21829528 (29.45%) aligned concordantly 0 times

37429446 (50.49%) aligned concordantly exactly 1 time

14868040 (20.06%) aligned concordantly >1 times

----

21829528 pairs aligned concordantly 0 times; of these:

  16038791 (73.47%) aligned discordantly 1 time

----
5790737 pairs aligned 0 times concordantly or discordantly; of these:

  11581474 mates make up the pairs; of these:

    2944609 (25.43%) aligned 0 times

    3136582 (27.08%) aligned exactly 1 time

    5500283 (47.49%) aligned >1 times

98.01% overall alignment rate

[huangp@localhost HCC]$ tail -n 15 ./WGC066460R_summary_discordant_enable.txt

71399828 reads; of these:

71399828 (100.00%) were paired; of these:

6859529 (9.61%) aligned concordantly 0 times

51649822 (72.34%) aligned concordantly exactly 1 time

12890477 (18.05%) aligned concordantly >1 times

----
6859529 pairs aligned concordantly 0 times; of these:

  3168251 (46.19%) aligned discordantly 1 time

----

3691278 pairs aligned 0 times concordantly or discordantly; of these:

  7382556 mates make up the pairs; of these:

    4285338 (58.05%) aligned 0 times

    1810517 (24.52%) aligned exactly 1 time

    1286701 (17.43%) aligned >1 times

97.00% overall alignment rate

The sequencing accompany told me that these 20 samples were constructing library simultaneously but sequencing in different flow cells of same sequencing machine (the sequencing platform was Illumina X10).

At first, I thought the maximum fragment length constraint might be too strict to meet, so I changed it from default 500bp to 800bp and even 1000bp, but few pairs increased to be concordantly aligned. Then I am wondering whether the fragment length calculation in HISAT2 considering the intron length for exon-spanning reads, since many introns could be very long? But if it is true, why same options gave high concordantly aligned pair rate for other 46 samples?

Any advice will be appreciated!

RNA-Seq HISAT2 low concordantly mapping pair rate • 4.1k views

ADD COMMENT • link 9.0 years ago by Peng Huang ▴ 50

0

Entering edit mode

I'm facing to the same question now. Have you got the answer?

ADD REPLY • link 8.4 years ago by sherry521007 • 0