Question: how to filter ChIP-Seq peaks called by MACS based on the measurements?
0
gravatar for nitinnarwade1504
9 months ago by
nitinnarwade15040 wrote:

Hello everyone, I am a beginner in this field.

Actually, my question is about ChIP-Seq data analysis, I have tried ChIP-Seq data analysis for BRCA1 protein (Human) with two replicates. The data quality is good and I have successfully reached at the step of peak calling.

I have used bowtie for raw read alignment and MACS14 for peak calling. At the end of this I got 96483 peaks.

Is it possible that one successful ChIP-Seq experiment could have these many number of peaks? OR if I want to reduce the number of peaks which criteria is better? I have already tried FDR (%) [reported at http://onetipperday.sterding.com/2013/08/how-to-select-macs-peaks-based-on-p.html]. but by using FDR (%) filter I hardly able to discard 10 peaks. In case of pValue criteria peaks have been already filtered by MACS14.

Any help would be really appreciated.

Thank you....!!

chip-seq • 377 views
ADD COMMENTlink modified 9 months ago by Petr Ponomarenko2.4k • written 9 months ago by nitinnarwade15040

In my opinion you should choose a p-value according with your data :)

ADD REPLYlink written 9 months ago by Lila M 370

If you have replicates then you should look into conducting some IDR analysis as done by ENCODE. Essentially you call peaks on each replicate individually and then you combine the 'true' peaks that overlap each other in both files.

ADD REPLYlink written 9 months ago by Sinji2.5k
1
gravatar for Petr Ponomarenko
9 months ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.4k wrote:

What is the protein you are analyzing, what is the binding area and how many there should be (compared to other known experiments)? Most likely you can find public data for the same species, protein, sequencing machine and chemistry at NCBI GEO and a corresponding paper in a good peer-reviewed journal. You can try your pipeline on that dataset, optimize parameters, compare with published data. This will give you a hint on how to analyze your own dataset and how to optimize parameters for it. 96483 peaks can be just right, or too little, or too few. Also, direct comparison with another replicate still can give you consistent artifacts that will be preserved between two different experiments just because of the systematic errors or (what is more likely) having certain areas of the genome called erroneously based on their sequence complexity.

ADD COMMENTlink written 9 months ago by Petr Ponomarenko2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1012 users visited in the last hour