Question: When and why is bwa aln better then bwa mem?
7
gravatar for dariober
2.7 years ago by
dariober7.8k
Glasgow - UK
dariober7.8k wrote:

Hi- As the post title says: When and why one should prefer bwa aln over bwa mem?

The bwa docs say that bwa mem is preferable for longer reads ( > 70 bp). But what is the disadvantage of using bwa mem for shorter reads?

Part of the reason I'm asking is that I have a variety of libraries of read lengths from ~40-70 bp to 150 bp, after quality and adapter trimming, mostly paired-end. I'd rather use one tool for all the read lengths to keep things consistent and bwa mem seems the best choice, unless there is some good reason to avoid it for reads between ~40 and 70 bp.

I have the impression (not tested) that bwa mem is much slower than bwa aln on shorter reads, but that's not an issue for me.

Thanks

Dario

bwa aln bwa mem comparison • 17k views
ADD COMMENTlink modified 2.7 years ago by Istvan Albert ♦♦ 71k • written 2.7 years ago by dariober7.8k

Hi, I am facing a similar problem.

I have 38bp paired-end ChIP-seq data. Should I use bwa aln or bwa mem?

Thanks,

Ming

ADD REPLYlink written 20 months ago by tangming20052.1k
1

answer my own question. I tested using teaser http://teaser.cibiv.univie.ac.at/reports/8dc974f7ce99f6958012619c052e5597/index.html#section4 and bwa aln seems to be a little better than bwa mem for 36bp short single end reads.

ADD REPLYlink written 10 days ago by tangming20052.1k
13
gravatar for Istvan Albert
2.7 years ago by
Istvan Albert ♦♦ 71k
University Park, USA
Istvan Albert ♦♦ 71k wrote:

There is the paper that you should read: http://arxiv.org/abs/1303.3997

But beyond that here is a more practical comparison

We are running a test in a lecture that focuses on alignment performance. For that we have generated 20,000 reads from the Ebola genome with pretty high (10%) sequencing error rates. Then ran bowtie2, bwa aln  and bwa mem and attempted to align the reads back to the genome. The mapping rates were:

  • bowtie2: 30%
  • bwa aln: 25%
  • bwa mem: 85%

Of course each of these mapping rates are for default settings that can be changed (see comments down) - but that's where we always start. From those it looks like  bwa mem goes a step further and will find alignments where other methods have already given up.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Istvan Albert ♦♦ 71k

Hi Istvan, thanks for reply. However I think you misunderstood my question... From the bwa mem paper, the documentation, and your benchmark it appears that bwa mem is always preferable to bwa aln, especially for longer reads. What I'm asking is: When is bwa aln a better choice over bwa mem? Can we forget about bwa aln altogether and just use bwa mem?

ADD REPLYlink written 2.7 years ago by dariober7.8k
3

the way I see it my test shows is that bwa mem is far more robust to errors than any other aligner. Length of the reads don't factor into this. Come to think of it these were on the short side 70bp - what wgsim generated by default.

There might be reasons to use aln but I look at it as a prior step that was necessary to get to the new method but in general little reason to keep using it.

In fact the problems caused by misalignment are far more insidious than simply losing some data. Ok so you only get 50% rather than 80% - but no! the reality is far more troubling than that and probably warrants a separate post itself.

An aligner's failure to map a read is not random! What this means is that there is a bias to certain type of errors occurring in certain parts of the genome/reads.  SNP calling on the default alignment with bowtie2 generates a large number of seemingly reliably called snps that do are not actully true (since we know what the wgsim simulated genome is). There is a bias towards certain types of errors which makes them look like real signal.

To me this was eye opening - the inability to align is not just data loss - we need to realize that - it may also introduce substantial biases that are then impossible to correct later.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Istvan Albert ♦♦ 71k

"Length of the reads don't factor into this" That's what I thought as well... The way the mem docs are written suggested to me that <70bp are not recommendable for bwa mem, hence my post.

ADD REPLYlink written 2.7 years ago by dariober7.8k

Did you try local alignment with bowtie2 (just add --very-sensitive-local)? That's usually the cause of big differences between bwa mem and bowtie2 like you found, though I think bwa mem generally does local alignment better than bowtie2 anyway.

ADD REPLYlink written 2.7 years ago by Devon Ryan68k

Making bowtie2 work better for this particular case was a homework due this week and worth 10 extra bonus points. I have not corrected these  so I do not know the answer yet :-)

Myself I tried  --very-sensitive and that only partially improved to 63% I just ran --very-sensitive-local and that too has about the same 63% much better than the original but still well below bwa mem.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Istvan Albert ♦♦ 71k
4

I have been educated with my students homework. Relaxing the seed mismatches to -N 1 has a substantial effect in this case. Combining that with the parameters that represent the --very-sensitive-local option leads to a bowtie2 mapping rate of 91% Best performing parameter settings:

-D 20 -R 3 -N 1 -L 20

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Istvan Albert ♦♦ 71k

Those are the same settings needed to make bison, which uses bowtie2 internally, perform the same as bwa-meth, which uses bwa mem internally, on an untrimmed dataset, so that makes sense.

ADD REPLYlink written 2.7 years ago by Devon Ryan68k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 815 users visited in the last hour