I think I have a pretty good understanding of how the MACS algorithm works. In a nutshell:
- Remove duplicate reads.
- Model a Poisson distribution for the background based on genome size and read count.
- Detect candidate peaks based on p-value.
- Model Poisson distributions for local regions of 1 kbp, 2 kbp, 5 kbp, and 10 kbp around each candidate peak. Take the max of these for each point.
- Calculate the p-value with respect to these local regions for each candidate peaks to filter out false positives.
However, I'm unable to find any information on the algorithmic difference between MACS and MACS2. I'm wondering what changes, if any, MACS2 makes to this algorithm. Also, what is the difference between the algorithms for finding broad peaks and narrow peaks in MACS2? Anybody have an idea where I can find out more about this?