Question: Chip-Seq merging peak files
2
gravatar for tkygyn
19 months ago by
tkygyn20
tkygyn20 wrote:

Hi all,

I am very knew to all things Chip-seq. We have performed multiple experiments and now I have to analyze multiple files and was told to pretty much merge the replicates and use the mean of the distance for each gene.

Up until here I agreed with, but while I understand merging the replicates I was also told to merge broad and narrow peak files.

For all I've been reading this sounds like a terrible idea, but I'm the new person. If I'm correct (there is always the chance that I'm completely wrong) what arguments would be best what could I use as reference to support this position ?

Thank you

chip-seq • 1.9k views
ADD COMMENTlink modified 19 months ago by apa@stowers370 • written 19 months ago by tkygyn20
5
gravatar for Michele Busby
19 months ago by
Michele Busby1.8k
United States
Michele Busby1.8k wrote:

First, are these A) technical or B) biological replicates? That is, the same biological sample run several times with the same antibody (same lot also if polyclonal) protocol, or different biological samples run the same way with the same protocol?

If it is A it may be reasonable to merge them for some analyses, such as just annotating peaks. I would merge the bam alignment files and then do the calls versus merging the calls.

However, first you have analyze your replicates to check they they all perform the same. We did a lot of performance comparisons here: https://epigeneticsandchromatin.biomedcentral.com/articles/10.1186/s13072-016-0100-6

You can steal those ideas, especially using the ENCODE segmentation tracks if it's human and they have tracks for something like your cell type. But just counting the reads in bins and then doing a correlation is pretty informative.

But even in our data, and we used a robot and do it a lot, one of our technical replicates behaved strangely. See supplemental figure S6.

If it is B, biological replicates, you almost certainly don't want to merge them. You will lose your information about biological variance is present. If you are looking at something like differential peaks between conditions DESeq and really all reputable programs will want some sort of replicates, almost always biological. In general, if you want to compute a p value on anything you need separate replicates (not merged).

If you are just annotating peaks you don't need a p value.

ADD COMMENTlink written 19 months ago by Michele Busby1.8k

Thank you. The paper was a great help.

ADD REPLYlink written 19 months ago by tkygyn20
4
gravatar for mforde84
19 months ago by
mforde841.2k
mforde841.2k wrote:

Hi!

You can merge peaks from distinct biological replicates, though as Michele points out there's no real follow up analysis you can do after doing so. After that, you can only really describe the data.

I'd suggest looking closely at ENCODE TF pipeline (see links below). We are interested in reproducibility of peaks (whether narrow or broad), and I assume that's close to what you're being asked to do. We generate a biological replicate pool and psuedoreplicate conditions from the pool and individual biological replicates. We then call peaks, and perform a IDR analysis across all of these conditions.

This is the official ENCODE writeup page: https://sites.google.com/site/anshulkundaje/projects/idr

This is my github for a working pipeline deployed on an AWS like cloud environment: https://github.com/mforde84/ENCODE_TF_ChIP_pipeline

Have fun, ChIP is alot of fun to work with!

M

ADD COMMENTlink modified 19 months ago • written 19 months ago by mforde841.2k
1
gravatar for apa@stowers
19 months ago by
apa@stowers370
Kansas City
apa@stowers370 wrote:

I'm not sure why you would be merging "narrow" and "broad" peak calls -- generally you are calling one type of peak or the other, only one of these is appropriate, depending on what it is you chipped, for instance transcription factor (narrow) or histone mark (broad). Unless perhaps you chipped an enzyme, say Pol2, which can have mixed behaviors that may require multiple peak calling strategies.

I also definitely recommend using IDR, and I use it all the time. BUT: be sure to use version 2. Version 1 had serious bugs!

ADD COMMENTlink written 19 months ago by apa@stowers370
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1861 users visited in the last hour