Hi All,
I'm currently analyzing a dataset containing 6 biological replicates of two conditions: 6 Condition1 vs 6 Condition2. Experiments were done some time ago in 3 different batches, e.g. 3 different days. Basically, I call peaks separately for every sample (n =12) using Input of that batch as a control. Then, I use DiffBind 3.0 to detect common peakset and find differentially bound peaks.
So the problem I'm stuck with - when I use all the available replicates - I get too few differentially bound peaks (11!). Combining replicates from different batches helps to increase number of diff-bound regions (though not sure which part of that is due to batch-effect).
I would really appreciate your tips on the following:
1) How can I pick proper replicates for DiffBind analysis? (in case ChipSeq fingerprint plots look similar for majority of replicates); 2) Is it appropriate to use samples from different batches- like 1+2+3 for Condition1 and 1+4+5 Condition2? Maybe I need to include multi-factor design in DiffBind package to account for my batch effect?
Thanks in advance,
Svetlana
Did you provide the batch information to the design, something like
~batch + condition
? This is how one commonly corrects for batch information given that each condition has replicates of all batches. Otherwise it condition would be confounded by batch and you could not correct for it.Thanks for a fast reply! Yes, I saw this option in the DiffBind vignette and was wondering about that. Although, I am still not sure whether it would be biologically appropriate to go forward with picking replicates from different batches.. Although this way gave me much higher number of differentially bound sites ~1,500.