Question: HISAT2 : Differences in paired and unpaired rates
gravatar for gwe3409
18 months ago by
Seattle, WA
gwe340910 wrote:

Is there an answer as to why is there a difference in running data as paired or unpaired and the resulting alignment rates?

I am new to genomic data analysis and I had assumed that paired data enables the alignment to be more efficient as a result of using 5' or 3' starting points. From the difference I think I am wrong here.

I am using hisat2 and I have what is identified as paired data from NCBI. The entry has this information:

Strategy: RNA-Seq
Selection: cDNA
Layout: PAIRED

I used the same hisat2 command with the exception of using -1 -2 in paired and -U for unpaired.

hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output.sam  \
        -1 xxx.fastq.gz -2 yyy,fastq.gz
hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output2.sam  \
        -U xxx.fastq.gz,yyy,fastq.gz

Running this as paired data I get:

Time loading forward index: 00:00:06
Time loading reference: 00:00:02
Multiseed full-index search: 00:38:48
37025651 reads; of these:
  37025651 (100.00%) were paired; of these:
    4900709 (13.24%) aligned concordantly 0 times
    21078023 (56.93%) aligned concordantly exactly 1 time
    11046919 (29.84%) aligned concordantly >1 times
    4900709 pairs aligned concordantly 0 times; of these:
      164376 (3.35%) aligned discordantly 1 time
    4736333 pairs aligned 0 times concordantly or discordantly; of these:
      9472666 mates make up the pairs; of these:
        6617630 (69.86%) aligned 0 times
        1583800 (16.72%) aligned exactly 1 time
        1271236 (13.42%) aligned >1 times
91.06% overall alignment rate
Time searching: 00:38:53
Overall time: 00:39:00

Just to see, I also ran this data as unpaired.

Time loading forward index: 00:00:04
Time loading reference: 00:00:02
Multiseed full-index search: 00:36:31
74051302 reads; of these:
  74051302 (100.00%) were unpaired; of these:
    6949421 (9.38%) aligned 0 times
    44591138 (60.22%) aligned exactly 1 time
    22510743 (30.40%) aligned >1 times
90.62% overall alignment rate
Time searching: 00:36:34
Overall time: 00:36:38

Thanks in advance...

hisat2 alignment assembly • 1.2k views
ADD COMMENTlink modified 18 months ago by Devon Ryan81k • written 18 months ago by gwe340910
gravatar for Devon Ryan
18 months ago by
Devon Ryan81k
Freiburg, Germany
Devon Ryan81k wrote:

You got a slightly better alignment rate when you correctly treated things as paired-end. You apparently have a very well behaved dataset, so the difference isn't large. I expect, however that if you looked at the two MAPQ distributions that perhaps you'd notice a larger benefit of paired-end alignment.

ADD COMMENTlink written 18 months ago by Devon Ryan81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 934 users visited in the last hour