Question: Chip-seq analysis with input and spike-in
gravatar for Damian Kao
18 months ago by
Damian Kao14k
Damian Kao14k wrote:

I have chip-seq data on histone modifications. I've been scouring literature and blogs on Chip-seq analysis involving normalizing to input and normalizing across samples using spiked-in samples.

There doesn't seem to be a cohesive differential binding analysis approach that can incorporate input normalization along with spike-in normalization.

It seems most of the diff. binding approaches involves using RNA-seq methods (EdgeR, DESeq2) on read counts over genomic windows. I can substitute normalization factors used in these RNA-seq packages with spike-in normalization factors, but how do I account for input? Is blacklisting sites that are not different from input really the best way? Transforming the counts over input via log2fc or subtraction is not statistically sound (other bioinformaticians seems to agree).

I've looked at the input signal for my data and have found signal patterns in areas consistent with some of my histone markers. This makes me think that I should really normalize my IP to input before performing differential binding analysis.

Presence of binding bias in input samples also seems to be supported by this paper ( where they found crosslinked, sonicated chip-seq samples (no IP) having signals that correspond to open chromatin.

Maybe input normalization isn't even necessary if we make the assumption that input is consistent across my different histone modification IPs? However, wouldn't that decrease the statistical power of the differential binding analysis?

This is my first time analyzing chip-seq data. Any thoughts on this from experts would be appreciated.

chip-seq • 1.3k views
ADD COMMENTlink modified 10 weeks ago by valentina.boeva40 • written 18 months ago by Damian Kao14k

Without being an expert, I have been told to not use input for normalization accross samples and that its usage is best limited to peak calling within conditions and visualisation (to ensure that peaks in the IP are not present in the input).

ADD REPLYlink written 18 months ago by Carlo Yague3.6k
gravatar for harold.smith.tarheel
18 months ago by
United States
harold.smith.tarheel4.0k wrote:

I'm not sure there is a good method for incorporating both normalizations, b/c they serve different functions. The spike-in is designed for global assessment of differences, while input is targeted to local differences. Spike-ins would allow you to detect an overall increase in (for example) H3K9me3 where the distribution of the mark is unchanged, whereas normalization to input by read depth would not. However, the increased read depth resulting from spike-in normalization would also be expected to produce broader peaks plus (more problematically) some number of new peaks that now exceed the statistical threshold. And, as you noted, bias exists in the input sample, so excluding that control will produce false-positive peaks in the experimental sample.

Our studies have largely involved changes in the distribution of marks, so we've always used input controls for peak calling. Perhaps users of spike-in controls will weigh in on their experiences.

ADD COMMENTlink written 18 months ago by harold.smith.tarheel4.0k
gravatar for Ryan Dale
18 months ago by
Ryan Dale4.6k
Bethesda, MD
Ryan Dale4.6k wrote:

I agree with Harold that spike-ins and inputs serve different purposes, and I don't know of any definitive answers on this. But here's some interesting reading from the authors of DESeq2, csaw, and diffBind that might give you some ideas:

The argument is that normalizing to input for the purposes of differential binding has its own set of problems that may be worse than just assuming that the input doesn't change across treatments.

Maybe you could compare the effects of normalizing for trended biases vs composition biases to see if the magnitude of the effects correspond to spike-in norm factors? In any case, it seems like csaw would be the best framework for playing around with spike-ins for normalization (based on the quality of its documentation and the sophistication of its tools).

ADD COMMENTlink written 18 months ago by Ryan Dale4.6k
gravatar for valentina.boeva
10 weeks ago by
valentina.boeva40 wrote:

You can try HMCan-diff, which now accepts spike-in information. HMCan-diff also removes the CG-content bias and copy number bias. The latter can it important in case if your two conditions are normal and cancer cells. Link to the HMCan paper in Nucleic Acids Research

ADD COMMENTlink written 10 weeks ago by valentina.boeva40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour