We try to do a control each time when we do the ChIP experiment, the mock IP which is mostly a pull down using Anti-GFP where the protein is not tagged.
Now, for exercise, I pulled up 3 mocks, which were done using same protocol, but at different dates by different and same person in the lab.
Using these 3 mocks to call peaks at the same sample, generates different resuts as the mocks are differentially enriched but as plain biologist, one would say that the mocks are more or less the same so the peaks should be more or less the same. The question is which peakset is the most true representative of the actual binding sites. I have done few tries to solve this dilemma.
- Call peaks on the sample using these controls and then compare the total number of peaks (they can vary a lot ranging from 70K to 10K), so the overlap analysis doesn't make a lot of sense, unless I reduce the the high peakset to a small one to have comparable numbers.
- Visualizing all the controls in the UCSC browser (they mostly have the enrichment at the same places but the height of the peak differs)
- Using the prior biological knowledge that where the protein should bind (we work with protein regulating gene expression and most of them bind at promoters)
Ideas to test
- Generate an averaged mock by combining all the 3 files, treating them as technical replicates.
- Repeat the analysis with a different peak caller.
I am using MACS14 for the peak calling with default paramters. Can you guys suggest some other kinds of tests or peak callers. Is there a tool where I can calculate the similarity between two bed files as in terms of enrichment except using the intersectBed to check how much of them overlap.
Principally another tool could be used is the bedOps suite to calculate enrichement at some specific boundaries (Whole gene body, promoters etc) and then make a scatter plot. I need more ideas.
P.S. I can add a graph to illustrate it more nicely, if required.