Question: Different results in bwa mem paired sequential and threaded versions
2
gravatar for chemaster
3.8 years ago by
chemaster20
Spain
chemaster20 wrote:

Hi all,

first all, I am new in bioinformatics, I came from the High Performance Computing world. I am trying the bwa tool to check it performance and I am facing the following problem. If I execute bwa mem paired in sequential mode (this is, without the -t option), then later I execute the program again with the same input and with 8 threads ( -t 8 ) the resulting sam file has a lot of differences with the sequential one. These are all the steps I did.

1.- Download the reference from: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

2.- Create the index with bwa index -a bwtsw

3.- Download the reads from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/NA12750/sequence_read/ERR000589_1.filt.fastq.gz and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/NA12750/sequence_read/ERR000589_2.filt.fastq.gz

4.- Execute bwa in sequential mode with bwa mem Reference ERR000589_1.filt.fastq ERR000589_2.filt.fastq > ExitSequential.sam (I previously unzipped the input files)

5.- Same as in point 4 but adding the -t 8 option

6.- I made a diff between the exits and I found a lot of differences in the FLAG, MAPQ, RNEXT and PNEXT fields of the sam files

Can somebody please help me with this? Thanks in advance.

The bwa version is 0.7.12-r1039 from github

bwa paired thread mem • 2.0k views
ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 3.8 years ago by chemaster20
1

This sort of behaviour has been reported on and off for the past year or so (see this thread: Bwa Mem Have Different Alignment Result When Using Different Threads ). Keep in mind that if you compare multimapping alignments between the two settings that you shouldn't necessarily expect the same results.

ADD REPLYlink written 3.8 years ago by Devon Ryan86k

According to the internet, in case of equally good multiple alignments, BWA will pick an alignment randomly. Thus, you wouldn't probably get identical results from two sequential runs either, unless maybe there's a fixed seed for every thread.. I don't know.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by 5heikki7.9k

I've readed the post Devon Ryan wrote. But they talk about a bug in the 0.7.5a version that is already solved, and other people said that they have the same results in the sequential and threaded version. I think these people tried it with a small quantity of reads and not with a big data entry.

In the case thatĀ 5heikki said, I've executed the sequential version several times and I've obtained the exact same output. This is because bwa inits always with the same seed ( I think it is a 11 ) and the randomly selected alignment is always the same because the init seed does not change.

This could be the reason why the results are different (each thread inits its own random number generator with the same seed, and after that the random numbers are different in the sequential version and in each thread, I don't know if I am explaining this well), but I am not sure of this.

ADD REPLYlink written 3.8 years ago by chemaster20
1

Exactly, the question is whether the RNG is reinitialized per thread, per alignment, or per run. If it's initialized with a new seed for each alignment then the results should be the same regardless of thread number (presuming the seed is some function of the alignment). In the other two cases, however, different results would be produced.

If you filter out the multimappers, do you still obtain discordant alignments between the two runs?

ADD REPLYlink written 3.8 years ago by Devon Ryan86k

Yes, if I filter with samtools view -q1 I still have discordances. Here goes a discordant alignment example:

-> ERR000589.781818    97    18    63487149    60    51M    1    55326423    0    GCAGAACCCGTGTTCCTTTGCTTCCCTAAAGTGCGAAAGCTTTCAGAAATG    IIIIHIIIIIIII@ICII87GG8HG38A,41-15I+)$?#11201
1*(,+3    NM:i:1    MD:Z:39A11    AS:i:46    XS:i:20
<- ERR000589.781818    65    18    63487149    60    51M    7    148149099    0    GCAGAACCCGTGTTCCTTTGCTTCCCTAAAGTGCGAAAGCTTTCAGAAATG    IIIIHIIIIIIII@ICII87GG8HG38A,41-15I+)$?#11201
1*(,+3    NM:i:1    MD:Z:39A11    AS:i:46    XS:i:20
ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by chemaster20

That's not a different alignment, its mate is just aligned differently. Is the mate multimapped?

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Devon Ryan86k

No, it is not multimapped.

One of the entries is from the sequential version, another one from one launched with 16 threads.

ADD REPLYlink written 3.8 years ago by chemaster20

That's not actually the question I asked. I realize that you showed an alignment of the same read from the single and multithreaded run. The question pertains to that reads' mate in each run. The alignment you showed is the same between each run. The differences pertain only to its mate. So, if the mate is multimapping then you'll get this, but it's not a bug or otherwise odd behaviour.

ADD REPLYlink written 3.8 years ago by Devon Ryan86k

As an aside, this would indicate that the RNG isn't reseeded with each alignment. This makes sense, since that'd be computationally wasteful.

ADD REPLYlink written 3.8 years ago by Devon Ryan86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1290 users visited in the last hour