Question: Bwa Mem Have Different Alignment Result When Using Different Threads
11
gravatar for shl198
3.5 years ago by
shl198330
United States
shl198330 wrote:

Hi, I used bwa-mem to align paired-end reads, I trimmed one fastq file which is 80 bps long, the mate fastq file is 79 bps long. When I used 8 threads, it mapped more reads than the default 1 thread. Now I am confused, changing the number of threads are supposed to only change the speed, why does it affect the mapping result? Does any one know the reason for that? Any comments are appreciated. Thank you.

• 6.7k views
ADD COMMENTlink modified 3.4 years ago • written 3.5 years ago by shl198330
2

What flags are you using? Does the behaviour persist with the -M flag? Can you identify where the differences are: i.e. reads mapped multiple times or a greater % of raw reads mapped?

ADD REPLYlink written 3.5 years ago by Alastair Kerr5.2k

I only set -t, for others I used default settings. Do I have to use -M all the time? I use samtools flagstat to test the result. All items are different. The biggest different different is the number of reads properly paired.

ADD REPLYlink written 3.5 years ago by shl198330
2

That could be a bug in the software, especially taking in account that race conditions are usually more difficult to debug and test - so they have more probability to get through undetected. What version of bwa do you use, and did you have a chance to see what was aligned differently between runs? I rely on bwa-mem in current project and am really interested in knowing more about this issue.

ADD REPLYlink written 3.5 years ago by Pavel Senin1.8k
1

how many more reads did it map?

ADD REPLYlink written 3.5 years ago by Giovanni M Dall'Olio25k

It mapped about 50,000 more reads.

ADD REPLYlink written 3.5 years ago by shl198330
1

Please provide a reproducible example.

ADD REPLYlink written 3.5 years ago by Michael Dondrup42k
10
gravatar for Pavel Senin
3.5 years ago by
Pavel Senin1.8k
Los Alamos, NM
Pavel Senin1.8k wrote:

I've just run a piece of my pipeline and data using different number of threads, the number is greater with more threads. Diff shows that some lines changed, some got added:

$ bwa

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.5a-r405
Contact: Heng Li <lh3@sanger.ac.uk>

The command line

bwa mem -t 1 -M dsm2059.fasta D170_R1_val_1.fq.gz D170_R2_val_2.fq.gz | samtools view -F 4 -Sbh - | samtools sort - D170.sorted_T1
bwa mem -t 2 -M dsm2059.fasta D170_R1_val_1.fq.gz D170_R2_val_2.fq.gz | samtools view -F 4 -Sbh - | samtools sort - D170.sorted_T2
bwa mem -t 3 -M dsm2059.fasta D170_R1_val_1.fq.gz D170_R2_val_2.fq.gz | samtools view -F 4 -Sbh - | samtools sort - D170.sorted_T3
bwa mem -t 4 -M dsm2059.fasta D170_R1_val_1.fq.gz D170_R2_val_2.fq.gz | samtools view -F 4 -Sbh - | samtools sort - D170.sorted_T4

The number of aligned reads is somewhat different:

$ samtools flagstat D170.sorted_T1.bam
53260 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
53260 + 0 mapped (100.00%:-nan%)
53260 + 0 paired in sequencing
26733 + 0 read1
26527 + 0 read2
32537 + 0 properly paired (61.09%:-nan%)
32934 + 0 with itself and mate mapped
20326 + 0 singletons (38.16%:-nan%)
85 + 0 with mate mapped to a different chr
65 + 0 with mate mapped to a different chr (mapQ>=5)

$ samtools flagstat D170.sorted_T2.bam
53388 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
53388 + 0 mapped (100.00%:-nan%)
53388 + 0 paired in sequencing
26797 + 0 read1
26591 + 0 read2
32862 + 0 properly paired (61.55%:-nan%)
33117 + 0 with itself and mate mapped
20271 + 0 singletons (37.97%:-nan%)
83 + 0 with mate mapped to a different chr
64 + 0 with mate mapped to a different chr (mapQ>=5)

$ samtools flagstat D170.sorted_T3.bam
53419 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
53419 + 0 mapped (100.00%:-nan%)
53419 + 0 paired in sequencing
26815 + 0 read1
26604 + 0 read2
32939 + 0 properly paired (61.66%:-nan%)
33156 + 0 with itself and mate mapped
20263 + 0 singletons (37.93%:-nan%)
87 + 0 with mate mapped to a different chr
65 + 0 with mate mapped to a different chr (mapQ>=5)

$ samtools flagstat D170.sorted_T4.bam
53442 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
53442 + 0 mapped (100.00%:-nan%)
53442 + 0 paired in sequencing
26828 + 0 read1
26614 + 0 read2
33006 + 0 properly paired (61.76%:-nan%)
33181 + 0 with itself and mate mapped
20261 + 0 singletons (37.91%:-nan%)
83 + 0 with mate mapped to a different chr
64 + 0 with mate mapped to a different chr (mapQ>=5)

That read, seems to be longer with 4 threads, maybe it has to do with seeds? enter image description here

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Pavel Senin1.8k
3

Out of curiosity, do you get identical results if you rerun one of these? I'm wondering if there's a random number generator being used somewhere for seeding, that might cause slightly different results with each run (in fact, just greping through the bwa source code, it makes use of random seeds and such in a number of places).

ADD REPLYlink written 3.5 years ago by Devon Ryan68k
2

I repeated runs with 8 threads and got identical results.

ADD REPLYlink written 3.5 years ago by shl198330

Second to that, I run for four and three threads with the same counts.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Pavel Senin1.8k
7
gravatar for shl198
3.4 years ago by
shl198330
United States
shl198330 wrote:

For those who may concern, I got answer from the developer of bwa mem. There is a bug in bwa-mem running with different threads in the version 0.7.5a. The other master branch versions may work fine.

ADD COMMENTlink written 3.4 years ago by shl198330

thanks, it's good to know, that the problem is acknowledged

ADD REPLYlink written 3.4 years ago by Pavel Senin1.8k

Is it all 0.7.5a versions? There are several revisions to 0.7.5.a in the master branch

ADD REPLYlink written 3.4 years ago by Dan Gaston6.8k

Only the released version in source forge has the bug, in github I think he fixed it. This is what he said: "There is a bug in 0.7.5a which affects the randomness. The master branch in git should not have this issue if I am right".

ADD REPLYlink written 3.4 years ago by shl198330

Sounds like it. That's a relief as I have been using the 0.7.5a version from GitHub in my production runs for awhile now. I am testing with the current github version to double check overnight/tomorrow

ADD REPLYlink written 3.4 years ago by Dan Gaston6.8k

So, is this solved?

the developer doesn't sound certain it's been fixed. Has anyone checked?

ADD REPLYlink written 3.4 years ago by Chris Cole630
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 815 users visited in the last hour