Question: Bowtie Pair-End Broken?
9
gravatar for Pablo
8.4 years ago by
Pablo1.9k
Canada
Pablo1.9k wrote:

According to this article in the Journal of Human Genetics (Apr-28, 2011), Bowtie jut doesn't work with pair end reads. For details, see table 2 (Bowtie only maps 0.02% of the reads in pair-end mode, whereas Bwa maps 99.46%), and table 3 (only 24.6% get mapped against 80.4% from bwa).

I always found Bowtie extremely picky when mapping pair-end reads, and usually I use Bwa. Nevertheless, I hear people saying that they use Bowtie all the time.

The question is: According to your experience, do you think that Bowtie is incapable of mapping pair-end reads as this article states or this just a mistake from the authors?

paired next-gen bowtie sequencing • 8.6k views
ADD COMMENTlink modified 8.1 years ago by brentp23k • written 8.4 years ago by Pablo1.9k

I guess it's safe to say that our experiences disagree with this publication. Most of us had problems with the default insert size in Bowtie, and may be this leads to degraded sensitivity. But for sure never as bad ad they claim.

ADD REPLYlink written 8.4 years ago by Pablo1.9k

I've contacted the authors pointing them to your comments in this thread. They answered within a few hours with an updated version of the paper. In my opinion, this is a very professional attitude.

ADD REPLYlink written 8.4 years ago by Pablo1.9k

@Pablo, that's pretty cool they were able (and willing) to update the paper.

ADD REPLYlink written 8.4 years ago by brentp23k
16
gravatar for brentp
8.4 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

@Brad is right. Though with the default parameters to dwgsim (which was used to generate the paired end reads), if both the documentation and implementations of bowtie and dwgsim were correct, it should have mapped more reads than it did. I wanted to understand why.

I ran a small simulation on a single chromosome. Indeed, using the default parameters for bowtie (as did the paper), there are about 0.1% of reads mapped.

If I increase the max-insert-size to 700 (default is 250), then over 65% of the pairs are mapped. Doing an insert size test on the mapped reads looks like this.

alt text

So, although it told dnaa to use an average outer distance of 300, it actually seems to be generating pairs with an inner distance of 300 since the mean is about 300 + 76 + 76 == 452.

The reason that bwa maps them (even though it already has a default of 500 for the max insert size) is that it actually doesn't use that parameter unless it is unable to guess the insert size on its own from the reads. From the docs

-a INT
Maximum insert size for a read pair to be considered being mapped properly. 
Since 0.4.5, this option is only used when there are not enough good 
alignment to infer the distribution of insert sizes. [500]

So it correctly infers a much larger insert size.

I documented what I did here.

ADD COMMENTlink written 8.4 years ago by brentp23k
2

Inferring the insert size distribution while mapping is not straightforward as we have to map the reads first to get the estimate. BWA's batch processing and the 2-stage mapping happen to make the inference much easier. I say "happen to" because this is not a feature in the initial design. BWA did try hard to make the default option work well for various types of input because 1) less experienced users may use wrong settings; 2) I may forget to apply the right options; 3) input of mixed quality need to be processed differently.

ADD REPLYlink written 8.4 years ago by lh331k
1

That should have been caught in review; it's pretty clear the authors didn't bother to look at why bowtie was doing so much worse than other methods.

ADD REPLYlink written 8.4 years ago by David Quigley11k
1

Also, on typical simulated data, I am pretty sure that most of the major mappers have very similar sensitivity. The specificity will be very different, but the authors did not measure. On real data of the "standard quality", the mapping sensitivity is typically around 90-96% for major mappers. Also, I think bowtie should use less memory than BWA, at least for single-end. SOAP2's memory is never as bad as 13GB. While I am happy BWA performs well according to their evaluation, I think on sensitivity/speed/memory, others should be as good or even better.

ADD REPLYlink written 8.4 years ago by lh331k

Brent -- nice analysis. I didn't realize bwa had a size inference step. This is a great demonstration of the power of default values.

ADD REPLYlink written 8.4 years ago by Brad Chapman9.5k

Thanks. Yes, and it's also surprising that bowtie has such a default for the maximum insert size (I guess it helps speed).

ADD REPLYlink written 8.4 years ago by brentp23k

Also, on typical simulated data, I am pretty sure that most of the major mappers have very similar sensitivity. The specificity will be very different, but the authors did not measure. On real data of the "standard quality", the mapping sensitivity is typically around 90-96% for major mappers. Also, I think bowtie should use less memory than BWA, at least for single-end. SOAP2's memory is never as bad as 13GB.

ADD REPLYlink written 8.4 years ago by lh331k

@lh3 I always thought that the 2-stage mapping was a design feature to optimize that :-)

ADD REPLYlink written 8.4 years ago by Pablo1.9k

@Pablo: I used the 2-stage mapping initially because I was lazy...

ADD REPLYlink written 8.4 years ago by lh331k

Within hours of hearing of this, Nils Homer updated the docs for dnaa to show that it generates pairs with an inner distance specified by -d, not the outer distance.

ADD REPLYlink written 8.4 years ago by brentp23k
5
gravatar for Brad Chapman
8.4 years ago by
Brad Chapman9.5k
Boston, MA
Brad Chapman9.5k wrote:

The bowtie default for paired-end maximum insert sizes is rather small, only 250bp:

% bowtie --help | grep maxins
-X/--maxins <int>  maximum insert size for paired-end alignment (default: 250)

Failing to adjust this wih larger insert sizes will lead to low mapping rates.

bwa sampe is a little more generous so with default parameters will do better:

 % bwa sampe
 [...] 
 Options: -a INT   maximum insert size [500]

This also confused me at first, but if you set similar parameters for both the alignment rates will be similar.

ADD COMMENTlink written 8.4 years ago by Brad Chapman9.5k

I agree, I always change this setting when analysing data with Bowtie.

ADD REPLYlink written 8.4 years ago by Pablo1.9k
3
gravatar for Josh
8.4 years ago by
Josh30
Josh30 wrote:

In my experience, BWA is better than Bowtie for mapping PE reads, but it isn't as bad as the authors say. If you look at the supplemental data, they use the following command for bowtie: bowtie -t -p 8 -v 2 -a bowtie/hg18 -q ERR008834.filt.fastq >bowtie.map

The -v option makes Bowtie ignore the quality scores and just sets a hard limit on mismatches. BWA on the other hand is a lot more lenient with mapping a second read if the first in the pair maps well.

ADD COMMENTlink written 8.4 years ago by Josh30

I agree, I found Bowtie a little bit picky, but not fundamentally broken as they claim.

ADD REPLYlink written 8.4 years ago by Pablo1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1644 users visited in the last hour