Peak calling macs vs Homer
1
1
Entering edit mode
4.9 years ago

Hi, I am trying to analyze ChIP seq data for a factor which do not behave as a typical transcription factor. When I am using MACS2 i hardly get any significant peaks. Around 20. And when I perform Homer I get 20000. I am not able to understand which peak caller to go for. Macs is not even picking up peaks which I can visually see in IGV. Can somebody help me to understand the reason behind it. I am using broad peak calling option. If I decide to go ahead with Homer then on what basis I should filter my peaks. I am planning correlated binding profile with transcriptome data further.

ChIP-Seq • 6.8k views
2
Entering edit mode

Can you post the Macs2 and homer commands ?

0
Entering edit mode

agreed. you should at least post the commands to show what you have tried.

1
Entering edit mode

Why would you use broad peak option if you are calling a TF?

7
Entering edit mode
4.8 years ago

I encountered this exact situation in the past and I made the decision to abandon both HOMER and MACS2. I spent, literally, months working with both programs using every possible parameter configuration but they were simply unable to detect the peaks correctly. As they couldn't identify peaks correctly, neither could any of the statistics from them be trusted.

Two programs that worked better for me were:

SICER takes a while to understand, but you'll just have to read the manual. You could take a look at this presentation from a colleague in Boston: http://cistrome.org/~czang/data/SICERtest/ChIPseq_SICER_CZ.pdf

With deepTools, which I would say is the superior ChIP-seq analysis suite, the bamCoverage function will help you to find peak regions, precisely, and then there are other functions that can allow you to compare. I think that producing a bedgraph peaks file with deepTools/bamCoverage and then feeding this into SICER for differential analysis is a possibility, but that's a pipeline that I've yet to try.

2
Entering edit mode

Bold statement towards two accepted and established tools. Could you give some details on which issues you encountered in particular and how you what kind of gold standard you have to judge that many oeaks are either not or falsely detected? Would be very interested in that.

1
Entering edit mode

Everything must be questioned in bioinformatics, always. That's how we improve both ourselves and the quality of our research.

I make the comments in particular relation to experiments like the OP's or the one on which I was working, i.e., for transcription factors or other DNA-binding proteins that don't behave in the traditional fashion. My marker, in particular, could result in short or extended peak regions (few thousand to > 1 million base-pairs). Every possible configuration of parameters in both of these programs could not identify the peaks just right. I have other independent investigators who equally abandoned efforts.

That said, HOMER and MACS are great progams that have served the epigenetics community very well for studying the activity of numerous transcription factors.

Please take a look at the work in which I was involved here: https://www.ncbi.nlm.nih.gov/pubmed/28199841

Cheers!

2
Entering edit mode

I also spent lot of time on MACS2 to analyse around 100 in-house generated chromatin data sets including ATAC-Seq, several TFs and several histone modifications. I agree that there are some discrepancies on statistics employed by MACS2, the way it defines significant peaks, but there are well accepted workarounds. Both Encode and Epigenome roadmap community uses MACS2 as one of their peak callers. I don't think MACS2 is something you can ignore or abandon.