HISAT2 : Differences in paired and unpaired rates
4.9 years ago
gwe3409 ▴ 10

Is there an answer as to why is there a difference in running data as paired or unpaired and the resulting alignment rates?

I am new to genomic data analysis and I had assumed that paired data enables the alignment to be more efficient as a result of using 5' or 3' starting points. From the difference I think I am wrong here.

I am using hisat2 and I have what is identified as paired data from NCBI. The entry has this information:

Strategy: RNA-Seq
Selection: cDNA
Layout: PAIRED

I used the same hisat2 command with the exception of using -1 -2 in paired and -U for unpaired.

hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output.sam  \
        -1 xxx.fastq.gz -2 yyy,fastq.gz
hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output2.sam  \
        -U xxx.fastq.gz,yyy,fastq.gz

Running this as paired data I get:

Time loading forward index: 00:00:06
Time loading reference: 00:00:02
Multiseed full-index search: 00:38:48
37025651 reads; of these:
  37025651 (100.00%) were paired; of these:
    4900709 (13.24%) aligned concordantly 0 times
    21078023 (56.93%) aligned concordantly exactly 1 time
    11046919 (29.84%) aligned concordantly >1 times
    4900709 pairs aligned concordantly 0 times; of these:
      164376 (3.35%) aligned discordantly 1 time
    4736333 pairs aligned 0 times concordantly or discordantly; of these:
      9472666 mates make up the pairs; of these:
        6617630 (69.86%) aligned 0 times
        1583800 (16.72%) aligned exactly 1 time
        1271236 (13.42%) aligned >1 times
91.06% overall alignment rate
Time searching: 00:38:53
Overall time: 00:39:00

Just to see, I also ran this data as unpaired.

Time loading forward index: 00:00:04
Time loading reference: 00:00:02
Multiseed full-index search: 00:36:31
74051302 reads; of these:
  74051302 (100.00%) were unpaired; of these:
    6949421 (9.38%) aligned 0 times
    44591138 (60.22%) aligned exactly 1 time
    22510743 (30.40%) aligned >1 times
90.62% overall alignment rate
Time searching: 00:36:34
Overall time: 00:36:38

Thanks in advance...

Assembly alignment hisat2 • 3.4k views
4.9 years ago

You got a slightly better alignment rate when you correctly treated things as paired-end. You apparently have a very well behaved dataset, so the difference isn't large. I expect, however that if you looked at the two MAPQ distributions that perhaps you'd notice a larger benefit of paired-end alignment.


