Question

ChIP-seq or Cut and Run Differential Binding Analysis

0

Entering edit mode

3.5 years ago

Pappu ★ 2.1k

I am trying to do Differential Binding Analysis of ChIP-seq and Cut and Run data using DIffbind. I got 2 normal samples, 2 normal IGG control samples, 2 treated samples and 2 treated IGG control samples. If I do peak calling by MACS2 after Bowtie2 alignment and duplicate removal, I get the following peaks:

Normal 1 over Normal IGG Control 1
Normal 1 over Normal IGG Control 1
Normal 2 over Normal IGG Control 2
Normal 2 over Normal IGG Control 2

and

Treated 1 over Treated IGG Control 1
Treated 1 over Treated IGG Control 1
Treated 2 over Treated IGG Control 2
Treated 2 over Treated IGG Control 2

My question is it a standard method to use both controls for such analysis using DIffbind? If not, what is the standard workflow for such analysis?

ChIP-Seq • 3.7k views

ADD COMMENT • link updated 3.5 years ago by Rory Stark ★ 2.0k • written 3.5 years ago by Pappu ★ 2.1k

1

Entering edit mode

You will probably not need the IgG samples during DE analysis. I personally only find them useful during peak calling to correct for background enrichments. They are simply too different from the IPs to be included into the DE analysis for some kind of interaction model. Therefore it comes down to a standard 2 vs 2 comparison. Diffbind is an option, alternatively check csaw or simply feed the count matrix directly into edgeR, even though a proper QC as discussed both in diffbind and csaw vignettes should be done before doing any DE testing.

ADD REPLY • link 3.5 years ago by ATpoint 82k

0

Entering edit mode

Thanks. So for macs2 peak calling, I would merge the two controls. Then I will have 2 peaks for Normal and 2 peaks for Treated. Is there any requirement for the IGG controls for Normal and Treated quite similar in order to avoid any confounding results? Existence of 4 controls is the reason for my confusion.

ADD REPLY • link 3.5 years ago by Pappu ★ 2.1k

score 1 · Answer 1 · 2020-11-02

I'm not sure why you have two sets of peaks for each pair? Or is it e.g. Normal 1 vs Normal IGG 1 and Normal 1 vs Normal IGG 2 etc?

There are a few ways to handle the controls in DiffBind:

Ignore the controls. You can run a differential analysis without reference to the controls using a consensus peakset derived from the controls. The idea is that the controls were used in identifying the enriched areas, and now you can looks for consistent changes in read counts within those areas.
Greylists: You can derive "greylists" (experiment-specific blacklists) from the IGG controls to identify anomalous regions that should be excluded from subsequent analysis. These can generated automatically from within DiffBind (this is now the default way to handle controls).
Subtracting IGG reads: if you don't use greylists, you can specify the "matched" control for each primary sample, and subtract the IGG reads. If there is a large pileup in the IGG, it will dampen or cancel out the main signal. DiffBind will handle this case as well, including scaling the control reads if the library sequencing depth is mismatched.

When you make your samplesheet, you can specify the appropriate control for each sample (eg Normal IGG Control 1 for Normal 1)

score 0 · Answer 2 · 2020-11-03

If the controls are matched -- IGG Control 1 was done in the same "batch" as Normal 1 -- you can just call peaks over the matched control.

As you are doing a quantitative differential analysis with these data, there's no need to over-think the controls. The peak calling is just a step to identify potential sites of interest, which will only be identified as being differential if the counts consistently differ. I'd be more concerned that you only have two replicates for each condition than getting overly fancy with the IgG controls; if there is much variance in your data, there may not be enough replicates to confidently identify differential sites.

If I were doing this analysis, I'd take the following steps:

Generate greylists from the four IgG samples, and merge them.
Filter reads from the Normal/Treated samples that overlap greylisted regions (as well as ENCODE blacklisted regions if one exists for your reference genome).
Call four sets of peaks (Normal 1/IgG Control 1 etc.)
Form a consensus peakset from the four sets; count overlapping reads
Normalize to background bins over the filtered reads
Perform a differential analysis on this count matrix

A simplified version of the above is to calculate/apply the blacklists/greylists after peak calling and filter peaks instead of reads; this can all be done very straightforwardly in DiffBind once peaks are called with the primary bam files.