Question: chip-seq differential binding
1
gravatar for Damian Kao
3.1 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

This is kind of a continuation from my previous question about using both spike-in normalization and input normalization in differential binding analysis (https://www.biostars.org/p/203724/).

I am also posting this question at the Bioconductor forum (https://support.bioconductor.org/p/85810/).

I read through the other posts on the Bioconductor forum about using RNA-seq packages to analyze Chip-seq data (https://support.bioconductor.org/p/72098/). It seems like the general consensus is to just ignore the input control samples or build a black-list and only look at differential binding between IP samples.

If I do want to incorporate input control into my differential binding, is it valid to include that data as another factor in the design matrix and perform a difference of difference type analysis?

So for every library, I would have a "IP" factor with two levels (IP/Input) and also a "sample" factor with two levels for treatment and control.

I've been trying to learn more about R and this series of tutorials seems to talk about building difference of difference type contrast (http://genomicsclass.github.io/book/pages/interactions_and_contrasts.html ). I am not very well versed in R, would it be possible to make this kind of contrast? And would this type of contrast even be valid?

ADD COMMENTlink modified 3.1 years ago by Devon Ryan91k • written 3.1 years ago by Damian Kao15k
1
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

Yeah, if you want to include the inputs then use them as a factor and keep them paired to their associated IP sample. The final structure should be like a human case-control or tumor-"surrounding tissue" experiment. Regarding the design in R, whether you need a contrast at all is dependent on how you set things up. I'd use something like ~IP*condition, where IP is a factor of ChIP and input levels and condition is whatever your experimental groups are. With that design you'd just take the interaction coefficient and not bother with a contrast.

BTW, be careful with the library size normalization. The naive method will be to scale everything based on counts from the peak regions, which will likely remove the difference between IP and input. You'd be better off getting counts in non-peak and non-blacklisted regions, using those to generate the normalization factors and then applying that to the counts from peaks.

My guess is that including the inputs in this is mostly useful when the peak calling isn't that great...but I'm be curious to hear what you get :)

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Devon Ryan91k
1

I am using DeepTool's SES method to calculate input scaling (I think it is basically doing what you are suggesting. It separates coverage into background/signal components and scale based on background). And I am combining that with the spike-in normalization factors to calculate an overall set of scaling factors that should be consistent among all the samples.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Damian Kao15k

I guess I can't complain if you're using deepTools ;)

Edit: I should warn you that SES is appropriate for generating a scaling factor between IP and input samples, but that won't help much for differences between input/IP pairs. If your sequencing depth is pretty uniform across the IP samples then this should be fine, but otherwise it might need some tweaking.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Devon Ryan91k
1

So the way I am generating scaling factors is this way. I hope this makes sense and is correct:

I calculated scaling factors for my treatment/control from spike-ins. For example:

Treatment : Control = 1 : 0.5

For treatment and control, I generated SES factors using the deeptools API:

Treatment IP : Treatment Input = 1 : 0.6
Control IP : Control Input = 1 : 0.5

To then get a consistent scaling factor among all treatment/control/ip/input, I just multiply the scaling factors from spike-in and SES:

Treatment IP = 1 (1 * 1)
Treatment Input = 0.6 (1 * 0.6)
Control IP = 0.5 (0.5 * 1)
Control Input = 0.25 (0.5 * 0.5)
ADD REPLYlink written 3.1 years ago by Damian Kao15k

That seems reasonable enough :)

ADD REPLYlink written 3.1 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2168 users visited in the last hour