Is there an answer as to why is there a difference in running data as paired or unpaired and the resulting alignment rates?
I am new to genomic data analysis and I had assumed that paired data enables the alignment to be more efficient as a result of using 5' or 3' starting points. From the difference I think I am wrong here.
I am using hisat2 and I have what is identified as paired data from NCBI. The entry has this information:
Strategy: RNA-Seq
Source: TRANSCRIPTOMIC
Selection: cDNA
Layout: PAIRED
I used the same hisat2 command with the exception of using -1 -2 in paired and -U for unpaired.
hisat2 --dta -p 8 -t -x /home/gerald/hg38/genome -S /media/500/output.sam \ -1 xxx.fastq.gz -2 yyy,fastq.gz hisat2 --dta -p 8 -t -x /home/gerald/hg38/genome -S /media/500/output2.sam \ -U xxx.fastq.gz,yyy,fastq.gz
Running this as paired data I get:
Time loading forward index: 00:00:06
Time loading reference: 00:00:02
Multiseed full-index search: 00:38:48
37025651 reads; of these:
37025651 (100.00%) were paired; of these:
4900709 (13.24%) aligned concordantly 0 times
21078023 (56.93%) aligned concordantly exactly 1 time
11046919 (29.84%) aligned concordantly >1 times
----
4900709 pairs aligned concordantly 0 times; of these:
164376 (3.35%) aligned discordantly 1 time
----
4736333 pairs aligned 0 times concordantly or discordantly; of these:
9472666 mates make up the pairs; of these:
6617630 (69.86%) aligned 0 times
1583800 (16.72%) aligned exactly 1 time
1271236 (13.42%) aligned >1 times
91.06% overall alignment rate
Time searching: 00:38:53
Overall time: 00:39:00
Just to see, I also ran this data as unpaired.
Time loading forward index: 00:00:04
Time loading reference: 00:00:02
Multiseed full-index search: 00:36:31
74051302 reads; of these:
74051302 (100.00%) were unpaired; of these:
6949421 (9.38%) aligned 0 times
44591138 (60.22%) aligned exactly 1 time
22510743 (30.40%) aligned >1 times
90.62% overall alignment rate
Time searching: 00:36:34
Overall time: 00:36:38
Thanks in advance...