Question

Replicates In Chip-Seq

1

Entering edit mode

10.6 years ago

Nick ▴ 290

I have the following dataset:

wild type:

2 male biol ChIP replicates for TF A
2 female biol ChIP replicates for TF B
1 male ChIP sample for TF C
1 male ChIP sample for TF D
1 male input sample (from one of the animals used for one of the samples for TF A)
1 female input sample (from one of the animals used for one of the samples for TF A)

knockouts:

1 male ChIP sample for TF A
1 pooled (male+female) ChIP sample for TF B

All animals are of similar age.

The main interest is the contrast between knockouts and wild types for TF A and TF B. I have the following questions:

(1) Does it make sense to take into account the samples for TF C and TF D?

(2) Does it make sense to take into account the sex (no sex-specific is effect is specifically expected)?

(3) How to make the best possible use of this data and which tool would you recommend? I have used macs. I am also aware about DiffBind, MEDIPS, diffreps and DBChIP but haven't used any of them so any specific recommendation regarding a tool and a workflow (if more than one tool is to be used) is most welcome.

chip-seq replicates model • 4.7k views

ADD COMMENT • link updated 10.6 years ago by Ying W ★ 4.2k • written 10.6 years ago by Nick ▴ 290

Ram · Answer 1 · 2013-09-12

No
This is easier to do in DiffBind than DBChIP but DBChIP has better way of estimating when no replicates (unless you use same dispersion for KD as WT)
In my experience, macs2 does not give too many results. If you know R well I would go with counting (using bedtools) and playing around with some of the edgeR functions (trying things like normalizing with median or full library size and making MA plots). If this is all the same cell type then subtracting input might not be as important. If you want something simple to run, go with DiffBind or DBChIP but the former is better documented. Another program you might want to consider is MAnorm.

Ram · Answer 2 · 2013-09-09

My two cents:

I don't see why you would take TF C and TF D into account for the TF A vs TF B contrast.
You could try to take it into account and compare to a model where you don't. This should be fairly easy in a DESeq/edgeR-like method such as DiffBind or DBChIP.
I would try DiffBind, make peak sets out of the various TF samples vs. input, encode as much information (which TF, knockout or not, sex, ...) as possible into a metadata table and try the GLM functionality in DiffBind.

Ram · Answer 3 · 2013-09-10

Is there a way to run DiffBind or any of the other tools without an input file for each sample?

I tried to use DiffBind but it seems to expect an input file for each sample. I have two such files for the wild type. I merged them and used the merged input sample as a parameter for macs to find out the peaks in the wild types. For the knockouts I used macs but without any input files. I tried to run DiffBind with a sample sheet in which I put as an input file for all wild type samples the same merged file but left the cells for the input files for the knockouts blank and DiffBind would complain about the missing input files. So I re-used the merged wild type input also for the knockouts. I don't feel good about it - is there a way to do differential binding analysis that takes into account replicates (I do have such for the wild type) but also tolerates the lack of any replicates or even input files?

I know this is not a good arrangement but this is not my data - I am just trying to tease out as much signal from it without doing anything improper.