chip-seq differential binding
1
1
Entering edit mode
4.7 years ago

This is kind of a continuation from my previous question about using both spike-in normalization and input normalization in differential binding analysis (https://www.biostars.org/p/203724/).

I am also posting this question at the Bioconductor forum (https://support.bioconductor.org/p/85810/).

I read through the other posts on the Bioconductor forum about using RNA-seq packages to analyze Chip-seq data (https://support.bioconductor.org/p/72098/). It seems like the general consensus is to just ignore the input control samples or build a black-list and only look at differential binding between IP samples.

If I do want to incorporate input control into my differential binding, is it valid to include that data as another factor in the design matrix and perform a difference of difference type analysis?

So for every library, I would have a "IP" factor with two levels (IP/Input) and also a "sample" factor with two levels for treatment and control.

I've been trying to learn more about R and this series of tutorials seems to talk about building difference of difference type contrast (http://genomicsclass.github.io/book/pages/interactions_and_contrasts.html ). I am not very well versed in R, would it be possible to make this kind of contrast? And would this type of contrast even be valid?

chip-seq differential binding • 1.7k views
1
Entering edit mode
4.7 years ago

Yeah, if you want to include the inputs then use them as a factor and keep them paired to their associated IP sample. The final structure should be like a human case-control or tumor-"surrounding tissue" experiment. Regarding the design in R, whether you need a contrast at all is dependent on how you set things up. I'd use something like ~IP*condition, where IP is a factor of ChIP and input levels and condition is whatever your experimental groups are. With that design you'd just take the interaction coefficient and not bother with a contrast.

BTW, be careful with the library size normalization. The naive method will be to scale everything based on counts from the peak regions, which will likely remove the difference between IP and input. You'd be better off getting counts in non-peak and non-blacklisted regions, using those to generate the normalization factors and then applying that to the counts from peaks.

My guess is that including the inputs in this is mostly useful when the peak calling isn't that great...but I'm be curious to hear what you get :)

1
Entering edit mode

I am using DeepTool's SES method to calculate input scaling (I think it is basically doing what you are suggesting. It separates coverage into background/signal components and scale based on background). And I am combining that with the spike-in normalization factors to calculate an overall set of scaling factors that should be consistent among all the samples.

0
Entering edit mode

I guess I can't complain if you're using deepTools ;)

Edit: I should warn you that SES is appropriate for generating a scaling factor between IP and input samples, but that won't help much for differences between input/IP pairs. If your sequencing depth is pretty uniform across the IP samples then this should be fine, but otherwise it might need some tweaking.

1
Entering edit mode

So the way I am generating scaling factors is this way. I hope this makes sense and is correct:

I calculated scaling factors for my treatment/control from spike-ins. For example:

Treatment : Control = 1 : 0.5


For treatment and control, I generated SES factors using the deeptools API:

Treatment IP : Treatment Input = 1 : 0.6
Control IP : Control Input = 1 : 0.5


To then get a consistent scaling factor among all treatment/control/ip/input, I just multiply the scaling factors from spike-in and SES:

Treatment IP = 1 (1 * 1)
Treatment Input = 0.6 (1 * 0.6)
Control IP = 0.5 (0.5 * 1)
Control Input = 0.25 (0.5 * 0.5)

0
Entering edit mode

That seems reasonable enough :)