Question: HISAT2 : Differences in paired and unpaired rates
1
gravatar for gwe3409
21 months ago by
gwe340910
Seattle, WA
gwe340910 wrote:

Is there an answer as to why is there a difference in running data as paired or unpaired and the resulting alignment rates?

I am new to genomic data analysis and I had assumed that paired data enables the alignment to be more efficient as a result of using 5' or 3' starting points. From the difference I think I am wrong here.

I am using hisat2 and I have what is identified as paired data from NCBI. The entry has this information:

Strategy: RNA-Seq
Source: TRANSCRIPTOMIC
Selection: cDNA
Layout: PAIRED

I used the same hisat2 command with the exception of using -1 -2 in paired and -U for unpaired.

hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output.sam  \
        -1 xxx.fastq.gz -2 yyy,fastq.gz
hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output2.sam  \
        -U xxx.fastq.gz,yyy,fastq.gz

Running this as paired data I get:

Time loading forward index: 00:00:06
Time loading reference: 00:00:02
Multiseed full-index search: 00:38:48
37025651 reads; of these:
  37025651 (100.00%) were paired; of these:
    4900709 (13.24%) aligned concordantly 0 times
    21078023 (56.93%) aligned concordantly exactly 1 time
    11046919 (29.84%) aligned concordantly >1 times
    ----
    4900709 pairs aligned concordantly 0 times; of these:
      164376 (3.35%) aligned discordantly 1 time
    ----
    4736333 pairs aligned 0 times concordantly or discordantly; of these:
      9472666 mates make up the pairs; of these:
        6617630 (69.86%) aligned 0 times
        1583800 (16.72%) aligned exactly 1 time
        1271236 (13.42%) aligned >1 times
91.06% overall alignment rate
Time searching: 00:38:53
Overall time: 00:39:00

Just to see, I also ran this data as unpaired.

Time loading forward index: 00:00:04
Time loading reference: 00:00:02
Multiseed full-index search: 00:36:31
74051302 reads; of these:
  74051302 (100.00%) were unpaired; of these:
    6949421 (9.38%) aligned 0 times
    44591138 (60.22%) aligned exactly 1 time
    22510743 (30.40%) aligned >1 times
90.62% overall alignment rate
Time searching: 00:36:34
Overall time: 00:36:38

Thanks in advance...

hisat2 alignment assembly • 1.4k views
ADD COMMENTlink modified 21 months ago by Devon Ryan84k • written 21 months ago by gwe340910
1
gravatar for Devon Ryan
21 months ago by
Devon Ryan84k
Freiburg, Germany
Devon Ryan84k wrote:

You got a slightly better alignment rate when you correctly treated things as paired-end. You apparently have a very well behaved dataset, so the difference isn't large. I expect, however that if you looked at the two MAPQ distributions that perhaps you'd notice a larger benefit of paired-end alignment.

ADD COMMENTlink written 21 months ago by Devon Ryan84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 590 users visited in the last hour