Hello, I am MSC student new to the field of bioinformatics and am currently analyzing ChIP-seq data for which peaks were called using the earlier version of MACS. A student in my lab recently re-ran peak calling on the data (histone marks and transcription factors) using MACS 2.0. We consistently obtain a smaller number of peaks with 2.0. Could someone tell me what the main difference between the two versions is and how that should affect the peaks called? Also, why is it that MACS 1.4 is still available for use? Thank you in advance!
See this link, it seems to answer your question. There are several opinions here.
"The differences between MACS 1.4.1 and MACS 1.4.2 are mainly changes to the command line default argument set. These changes are pretty notable, and they include changing --on-auto to --off-auto which says that even if the number of paired peaks is less than 1000 then it will still use the calculated shiftsize iirc. Also version 1.4.2 changes --to-large to --to-small by default and --keep-dups is turned off. See the changelog in 1.4.2 for details
I have not used MACS 2 but it is probably true that the software is more capable of calling broad peaks. A recent review of ChIP-seq guidelines for the ENCODE project mentions this specifically. This paper also has good info about quality control for chip-seq and a small review of the software, see "ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia" Full text is here: http://www.ncbi.nlm.nih.gov/pubmed/22955991
The change from 1.4 and 2.0 according to the changelog:
2011-05-17 Tao Liu <email@example.com> MACS Version 2.0.0 (tag:alpha) * Use bedGraph type to store data internally and externally. We can have theoretically one-basepair resolution profiles. 10 times smaller in filesize and even smaller after converting to bigWig for visualization. * Peak calling process modified. Better peak boundary detection. Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp one will be averaged to d size) Then calculate the maximum value of these two tracks and a global background, to have a local-lambda bedGraph. Use -10log10poisson_pvalue as scores to generate a score track before peak calling. A general peak calling based on a score cutoff, min length of peak and max gap between nearby peaks. * Option changes. Wiggle file output is removed. Now we only support bedGraph output. The generation of bedGraph is highly recommended since it will not cost extra time. In other words, bedGraph generation is internally run even you don't want to save bedGraphs on disk, due to the peak calling algorithm in MACS v2. * cProb.pyx We now can calculate poisson pvalue in log space so that the score (-10*log10pvalue) will not have a upper limit of 3100 due to precision of float number. * Cython is adopted to speed up Python code.
I guess this means there are no major algorithmic differences between the programs.