Question: BWA MEM alignment output of splited fastq files differ from the original(unsplit) fastq file
gravatar for manojkumar_bhosale
4.4 years ago by
manojkumar_bhosale70 wrote:


I want to do alignment of paired end fastQ files (R1 and R2) for which I am using BWA MEM tool. As this aligner takes  some time to do alignment with single huge fastQ file I splited R1 and R2 fastq files in multiple small fastq files(All files followed same sequence of reads as in Original file) and tried to align seperately small R1 and R2 pairs. Later on I merged the small SAM files generated and compared the SAM file with SAM file generated with original(huge) fastq files  (with picard "CompareSAMs" command) . I noticed that the SAM files differ by significant number of reads.

Can anybody please let me know if I am doing it in right way or should I stick to the original files only?

If differences are expected then what might be the possible reason?

Any help on this is really appreciated..

alignment next-gen • 2.1k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by manojkumar_bhosale70
  1. What version of bwa?
  2. What sorts of MAPQ values do the discordant alignments have?

Issues like this get reported from time to time and typically it's due to the random seeding step, though I think it got fixed at least once (see the following thread, for example: Bwa Mem Have Different Alignment Result When Using Different Threads ).

ADD REPLYlink written 4.4 years ago by Devon Ryan92k

BWA version is 0.7.10-r789

All discordnt alignments are havong Zero mapping quality. I tried with changing the number of threads but it seems alright as results are not changing.


ADD REPLYlink written 4.4 years ago by manojkumar_bhosale70

If the differences are only between alignments with MAPQ of 0 then that's expected. Those alignments are randomly chosen.

ADD REPLYlink written 4.4 years ago by Devon Ryan92k

Does this mean that if I run Original fastQ file or multiple split fastQ files(generated from original fastQ file) the alignment output will not differ for Non zero mapping quality reads? If yes, then can I split and parallelly run aligner(on distributed network) and later on merge the SAM files to get reliable results?

ADD REPLYlink written 4.3 years ago by manojkumar_bhosale70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2639 users visited in the last hour