Hi,
I am using the new HISAT2 v2.2.0 to perform alignment of paired end RNA-seq data. The alignment report suggests that not all the reads were paired ended. I am slightly confused by this. I thought all reads in PE sequencing have mate pairs. Thew new HISAT2 summary looks like this.
HISAT2 summary stats:
Total pairs: 76700832 Aligned concordantly or discordantly 0 time: 3686808 (4.81%) Aligned concordantly 1 time: 60966514 (79.49%) Aligned concordantly >1 times: 11843091 (15.44%) Aligned discordantly 1 time: 204419 (0.27%) Total unpaired reads: 7373616 Aligned 0 time: 3934307 (53.36%) Aligned 1 time: 2579590 (34.98%) Aligned >1 times: 859719 (11.66%) Overall alignment rate: 97.44%
While the old HISAT2 summary, used to look like this (taken from HISAT2 website)
Alignment summary (not for the same data, just want to show it used to say 100% of reads were paired end)
10000 reads; of these: 10000 (100.00%) were paired; of these:
650 (6.50%) aligned concordantly 0 times
8823 (88.23%) aligned concordantly exactly 1 time
527 (5.27%) aligned concordantly >1 times
----
650 pairs aligned concordantly 0 times; of these:
34 (5.23%) aligned discordantly 1 time
----
616 pairs aligned 0 times concordantly or discordantly; of these:
1232 mates make up the pairs; of these:
660 (53.57%) aligned 0 times
571 (46.35%) aligned exactly 1 time
1 (0.08%) aligned >1 times
96.70% overall alignment rate
I calculated that now almost 9% of my reads are not paired end. Is that normal? Did older HISAT2 used to discard unpaired reads for alignment?
Did you manipulate the fastq files somehow? Like trimming or any custom kind of filtering?
No. I just ran fasqtc before aligning and din't perform any trimming. The second summary is actually not for the same data. I will clarify that in the question.
This unrelated post also has an example where the read summary is separated into "paired end reads" and "unpaired end reads".
hisat2 --sra-acc with paired reads producing single read output