9.0 years ago by
Salt Lake City, UT
@Brad is right. Though with the default parameters to dwgsim (which was used to generate the paired end reads), if both the documentation and implementations of bowtie and dwgsim were correct, it should have mapped more reads than it did. I wanted to understand why.
I ran a small simulation on a single chromosome. Indeed, using the default parameters for bowtie (as did the paper), there are about 0.1% of reads mapped.
If I increase the max-insert-size to 700 (default is 250), then over 65% of the pairs are mapped.
Doing an insert size test on the mapped reads looks like this.
So, although it told dnaa to use an average outer distance of 300, it actually seems to be generating pairs with an inner distance of 300 since the mean is about 300 + 76 + 76 == 452.
The reason that bwa maps them (even though it already has a default of 500 for the max insert size) is that it actually doesn't use that parameter unless it is unable to guess the insert size on its own from the reads. From the docs
Maximum insert size for a read pair to be considered being mapped properly.
Since 0.4.5, this option is only used when there are not enough good
alignment to infer the distribution of insert sizes. 
So it correctly infers a much larger insert size.
I documented what I did here.