This looks to be a very impressive tool for mapping BS-Seq data in base-space and colorspace. http://genomebiology.com/2012/13/10/R82
Figure 2 shows a comparison with some other aligners: http://genomebiology.com/2012/13/10/R82/figure/F2
BatMeth has a lower false+ rate with good speed and good true+
They evaluate on real data in base and colorspace, and on simulated data (and it's not their own simulator).
The method is outlined in figure 4: http://genomebiology.com/2012/13/10/R82/figure/F4
Briefly the process for a single read is:
- convert reference for + and -
- convert read for + and -
- check 4 possible mappings of read
- exclude read if it maps to > N possible locations
- compute number of mismatches and report its status as unique or not (only unique reads are used in calculation).
Since this is similar or identical to other BS-Seq mappers, it's not clear to me where they gain in accuracy. It must be that
- They discard low-complexity reads (based on shannon-entropy).
- They discard reads that are not unique
on an unrelated note it seems these bisulfite mappers have the strangest names:
BS Seeker
,BatMeth
,BS Map
when I first wrote MethylCoder, it spent a few days as "MethLab"
hehe, for a second I imaged what it would sound like to ask for funding to "improve the MethLab run times"