BWA MEM alignment output of splited fastq files differ from the original(unsplit) fastq file
0
0
Entering edit mode
7.8 years ago

Hi,

I want to do alignment of paired end fastQ files (R1 and R2) for which I am using BWA MEM tool. As this aligner takes some time to do alignment with single huge fastQ file I split R1 and R2 fastq files in multiple small fastq files(All files followed same sequence of reads as in Original file) and tried to align separately small R1 and R2 pairs. Later on I merged the small SAM files generated and compared the SAM file with SAM file generated with original(huge) fastq files (with picard "CompareSAMs" command). I noticed that the SAM files differ by significant number of reads.

Can anybody please let me know if I am doing it in right way or should I stick to the original files only?

If differences are expected then what might be the possible reason?

Any help on this is really appreciated.

next-gen alignment • 3.5k views
1
Entering edit mode
1. What version of bwa?
2. What sorts of MAPQ values do the discordant alignments have?

Issues like this get reported from time to time and typically it's due to the random seeding step, though I think it got fixed at least once (see the following thread, for example: Bwa Mem Have Different Alignment Result When Using Different Threads ).

0
Entering edit mode

BWA version is 0.7.10-r789

All discordant alignments are havong Zero mapping quality. I tried with changing the number of threads but it seems alright as results are not changing.

0
Entering edit mode

If the differences are only between alignments with MAPQ of 0 then that's expected. Those alignments are randomly chosen.

0
Entering edit mode

Does this mean that if I run Original fastQ file or multiple split fastQ files(generated from original fastQ file) the alignment output will not differ for Non zero mapping quality reads? If yes, then can I split and parallelly run aligner(on distributed network) and later on merge the SAM files to get reliable results?