We are working on some ChIP-seq data targeting histone modifications which may be widely dispersed across the genome. We expect the enrichment to be much less peaky and more 'foothills vs. plains'. Does anyone have any tips or tricks for getting the most out of peak calling software in this situation or any alternative strategies? We are using Macs14 and QuEST but are open to trying other methods.
We have looked at this type of data in the following papers:
- DOI: 10.1038/nature08924 link
- DOI: 10.1016/j.molcel.2010.01.030 link
- DOI: 10.1371/journal.pgen.1001134 link
We ended up writing our own set of perl scripts and changed variables based on the type of data examined (full details and rationales in the methods section of these papers)
Our 'peak finder' defines a peak as an area where values are above the height threshold, less than the specified gap apart and the whole area falls within the length thresholds. It outputs the peaks as a Bed file with statistics. I'll put our peak finding script here but please be warned that is was not intended for release and it is still largely undocumented. If there is a need, we can tidy it up and put it on the galaxy toolshed.
Nice discussion topic -- I've also struggled with large histone modification peaks. One approach which we had some success with is using MACS 1.4 with the call-subpeaks option to subdivide the larger peaks using PeakSplitter:
We then overlapped these with nucleosome position calls from NPS:
This was helpful to get a more refined set of reference nucleosome regions that could then be used for comparisons between experiments. Here's the python code used to combine the NPS and MACS calls:
I've used SICER and CCAT with reasonable success for this kind of problem. The former is specifically designed for diffuse enrichment regions (but not TFs) and the latter has a "peak mode" for TFs and similar cases and a "region mode" for cases where you expect a more spread-out enrichment.