The bwa-meth paper has been rejected from bioinformatics after the 2nd revision. There was some concern about how I did the comparisons [either inadequate or unfair], for example it was suggested that one was able to get more reads to align using default parameters for bsmap. This is true, see the image at the end of this post where the red dot is the default parameters and the blue is the parameters I chose. Unfortunately, all of the additional alignments with default params are incorrect--this is for simulated data, so it is possible to know the additional reads were aligned off-target.
I think there is value in the comparison presented in the bwa-meth paper, and to my knowledge, it is the most thorough evaluation of BS-Seq read mappers and the only one on genome-wide real data. Regardless of whether you should choose to use bwa-meth, there are some important points (the 2nd is implicit, but you can find it in the code and more on that below):
- If you use a bowtie (1 or 2) backed aligner, you must quality-trim your reads.
- There is great variability in the work required to post-process the output from these into usable data.
- Number of reads mapped is not a useful metric.
There are a number of things I didn't write in the paper, but that I think are worth mentioning, most are somehow related to the 2nd point above. For example, the output from bsmooth comes in a watson bam and a crick bam, I did coordinate flipping, but it's not clear to me the full set of steps that would be required to make the crick bam usable. (See dpryan's take on it, and my handling of it). For LAST, which, as you can see performs very well, I had to remove the requirement that the reads are paired since it does not set that flag for a lot of true pairs--clearly this could be fixed, but it is an issue. LAST prefers to output MAF, so it is likely that users who perfer that format will have no trouble. For the most part, I tend to use BAM. Since I started writing bwa-meth (which is now some time ago), bismark has greatly improved, adding sorted header output, compressed and BAM output, and mapping quality scores in the most recent version released in the last few days. Bismark, BSMAP, and GSNAP were the aligners that just worked. I am not sure that I have represented bison in its best light as Devon Ryan has shown it to be quite good on the real dataset used in the paper.
Finding the right parameters for 8 aligners is hard, I attempted to find the best case for each aligner (figure below is case-in-point). I welcome any changes in that regard, one advantage of keeping the paper on arXiv is that I can continue to make updates. Although I did a number of searches for the best parameters for other aligners (see this script where I gridded parameters for bismark+bowtie2 to find the best performing), I actually did very little optimization on bwa-meth (thanks to Heng Li for making great software). You can see in the code that I change the parameters from the default, but using the defaults makes a small enough difference that it is probably not noticeable in the ROC plots.
I have asked the editors of bioinformatics if I can make the reviews public as I think they are of interest. Just as it's difficult to do the comparison fairly, it's difficult to review this type of paper, not having an open dialog with reviewers makes this intractable.
The comparisons that I did are reproducible. The entire code is available on github in the bwa-meth repo. For example, you can see the change I made to the code to compare bsmap parameters in the image below in this revision. If you see any errors or any way to improve the performance of the aligners open an issue or send me an email. Oh, and same if you see a way to improve the bwa-meth.