Question: Chip Seq Analysis Using Macs At A Pvalue Of 1E-2 Then Interesecting To Call "True Peaks"
gravatar for jhrf
7.3 years ago by
jhrf10 wrote:

I am currently analysing ChipSeq data from 4 different proteins in order to build up some idea of correlations between and across the c. elegens genome. Essentially I want to see where each protein overlaps with the others and where.

So far I have called peaks on all of my data sets (which include biological and technical replicates) I am now browsing data before I start comparing to find correlations (overlaps, intersections etc).

Some of my data is quite noisy, and in order to get the best out of it I have run MACS2 on a relatively low pvalue threshold (5e-2) and then only taken peaks which are confirmed across technical and biological replicates, hoping to catch noise and wrongly called peaks at this step. It seems to have worked empirically and I am seeing sensible results. However, this is my first solo bioinformatics project and I just wanted to check to see if this was a sensible method.

Is anyone able to recommend a better method? Is my MACS2 cutoff prohibitively low? Can anyone point me to papers which details methods for this sort of thing? I bow to the greater knowledge and wisdom of this community. Many thanks.

macs calling peak-calling • 3.5k views
ADD COMMENTlink modified 7.3 years ago by alessandro.riccombeni20 • written 7.3 years ago by jhrf10
gravatar for KCC
7.3 years ago by
Cambridge, MA
KCC4.0k wrote:
  1. Instead of using a p-value of 0.05, why not use a q-value of 0.05? I think 0.05 is quite low for a p-value for MACS.

  2. I would also suggest using IDR. It determines the reliability of peaks based on the replicability,

  3. Do you keep the duplicate reads? MACS has a setting to keep just one read. You should use it. This can help with the sensitivity to noise.

Here are some papers, Systematic evaluation of factors influenicng ChIP-seq fidelity. Nat Methods 2012; 9(6):609-614. Identifying ChIP-seq enrichment using MACS. Nat Protoc 2012; 7(9):1728-40. Measuring reproducibility of high-throughput experiments Ann. Appl. Stat. Volume 5, Number 3 (2011), 1752-1779.

ADD COMMENTlink written 7.3 years ago by KCC4.0k

Thanks for your comment. I am making my way through the papers you recommend.

Are there any studies on the advantages of using pvalue over qvalue? I think my methods will come under a fair amount of scrutiny and I'd love to have something solid to back it up.

ADD REPLYlink written 7.3 years ago by jhrf10

From what I recall, the author of macs, Tao Liu, recommended the q value over the p value. You can join the macs mailing list and ask him directly about this. My guess is that the q value is more empirical as it's based on the number of false positives in the input control, while the p value is based on a model of the data which is probably too simple.

ADD REPLYlink written 7.3 years ago by KCC4.0k
gravatar for alessandro.riccombeni
7.3 years ago by
alessandro.riccombeni20 wrote:

Hi jhrf, if this is your first "solo" project I recommend starting by looking at what other people have done. The IDR test suggested by George is a great way to start, as it's been recommended by the ENCoDE project itself. You should probably start by reading this:

And then go through the method linked by George, and verify the results you get from your data. You might try to set q at 0.05 and 0.01 and compare the results.

Also, try to define (if you didn't already) some quantitative definition of "overlap" for your peaks. Peaks from different replicates located in the same promoter could have relatively distant summits, and that's where the binding site is likelier to be, i.e. you could be putting together different binding sites.

ADD COMMENTlink written 7.3 years ago by alessandro.riccombeni20

Thank you for the recommendations, I will consider them in my methods.

ADD REPLYlink written 7.3 years ago by jhrf10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1287 users visited in the last hour