HISAT2 : Differences in paired and unpaired rates
1
1
Entering edit mode
7.3 years ago
gwe3409 ▴ 10

Is there an answer as to why is there a difference in running data as paired or unpaired and the resulting alignment rates?

I am new to genomic data analysis and I had assumed that paired data enables the alignment to be more efficient as a result of using 5' or 3' starting points. From the difference I think I am wrong here.

I am using hisat2 and I have what is identified as paired data from NCBI. The entry has this information:

Strategy: RNA-Seq
Source: TRANSCRIPTOMIC
Selection: cDNA
Layout: PAIRED

I used the same hisat2 command with the exception of using -1 -2 in paired and -U for unpaired.

hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output.sam  \
        -1 xxx.fastq.gz -2 yyy,fastq.gz
hisat2 --dta -p 8 -t  -x /home/gerald/hg38/genome -S /media/500/output2.sam  \
        -U xxx.fastq.gz,yyy,fastq.gz

Running this as paired data I get:

Time loading forward index: 00:00:06
Time loading reference: 00:00:02
Multiseed full-index search: 00:38:48
37025651 reads; of these:
  37025651 (100.00%) were paired; of these:
    4900709 (13.24%) aligned concordantly 0 times
    21078023 (56.93%) aligned concordantly exactly 1 time
    11046919 (29.84%) aligned concordantly >1 times
    ----
    4900709 pairs aligned concordantly 0 times; of these:
      164376 (3.35%) aligned discordantly 1 time
    ----
    4736333 pairs aligned 0 times concordantly or discordantly; of these:
      9472666 mates make up the pairs; of these:
        6617630 (69.86%) aligned 0 times
        1583800 (16.72%) aligned exactly 1 time
        1271236 (13.42%) aligned >1 times
91.06% overall alignment rate
Time searching: 00:38:53
Overall time: 00:39:00

Just to see, I also ran this data as unpaired.

Time loading forward index: 00:00:04
Time loading reference: 00:00:02
Multiseed full-index search: 00:36:31
74051302 reads; of these:
  74051302 (100.00%) were unpaired; of these:
    6949421 (9.38%) aligned 0 times
    44591138 (60.22%) aligned exactly 1 time
    22510743 (30.40%) aligned >1 times
90.62% overall alignment rate
Time searching: 00:36:34
Overall time: 00:36:38

Thanks in advance...

Assembly alignment hisat2 • 4.2k views
ADD COMMENT
1
Entering edit mode
7.3 years ago

You got a slightly better alignment rate when you correctly treated things as paired-end. You apparently have a very well behaved dataset, so the difference isn't large. I expect, however that if you looked at the two MAPQ distributions that perhaps you'd notice a larger benefit of paired-end alignment.

ADD COMMENT

Login before adding your answer.

Traffic: 2839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6