Question: Chip-seq analysis with input and spike-in
gravatar for Damian Kao
4.6 years ago by
Damian Kao15k
Damian Kao15k wrote:

I have chip-seq data on histone modifications. I've been scouring literature and blogs on Chip-seq analysis involving normalizing to input and normalizing across samples using spiked-in samples.

There doesn't seem to be a cohesive differential binding analysis approach that can incorporate input normalization along with spike-in normalization.

It seems most of the diff. binding approaches involves using RNA-seq methods (EdgeR, DESeq2) on read counts over genomic windows. I can substitute normalization factors used in these RNA-seq packages with spike-in normalization factors, but how do I account for input? Is blacklisting sites that are not different from input really the best way? Transforming the counts over input via log2fc or subtraction is not statistically sound (other bioinformaticians seems to agree).

I've looked at the input signal for my data and have found signal patterns in areas consistent with some of my histone markers. This makes me think that I should really normalize my IP to input before performing differential binding analysis.

Presence of binding bias in input samples also seems to be supported by this paper ( where they found crosslinked, sonicated chip-seq samples (no IP) having signals that correspond to open chromatin.

Maybe input normalization isn't even necessary if we make the assumption that input is consistent across my different histone modification IPs? However, wouldn't that decrease the statistical power of the differential binding analysis?

This is my first time analyzing chip-seq data. Any thoughts on this from experts would be appreciated.

chip-seq • 6.4k views
ADD COMMENTlink modified 3.0 years ago by nicolas.descostes140 • written 4.6 years ago by Damian Kao15k

Without being an expert, I have been told to not use input for normalization accross samples and that its usage is best limited to peak calling within conditions and visualisation (to ensure that peaks in the IP are not present in the input).

ADD REPLYlink written 4.6 years ago by Carlo Yague5.7k
gravatar for nicolas.descostes
3.0 years ago by
United States
nicolas.descostes140 wrote:

We have started to develop a package for this: ChIPSeqSpike (

ADD COMMENTlink written 3.0 years ago by nicolas.descostes140

hi,nicolas.descostes: Now´╝îI have a problem is that: I want to use MACS to call Peak from the result of ChIPSeqSpike. I do not know how can I design the downstream analysis. Your help would be appreciated.

ADD REPLYlink written 3.0 years ago by huangzy62810


One solution would be to use the BamCoverage function of deeptools to obtain bedgraphs, then convert them to bed and then use macs2. You can use the scaling factors given by the spikesummary function in bamcoverage.

ADD REPLYlink written 3.0 years ago by nicolas.descostes140

thank a lot for your answer. I also have some porblems: 1. Is the "test_coord.gff" is come from the Ensemble? If not, how do I get the gff file? 2. I put the genome file such as hg19.fa in the extdata directory of ChIPSeqSpike package. After test the Example in ChIPSeqSpike, I get a error that is "Error in getPlotSetArray(tracks = files, features = gff_vec, refgenome = genome_version, : No genomes installed!". How can I solve these problems? Your help would be appreciated.

ADD REPLYlink written 3.0 years ago by huangzy62810

can you contact me by gmail? Please send again your message, it will be easier.

ADD REPLYlink written 3.0 years ago by nicolas.descostes140

thank you very much. I have sent the questions to your email.

ADD REPLYlink written 3.0 years ago by huangzy62810

Hi, Nicolas, Can you specify which scaling factor (i.e., endo vs. exoScalFact, or the ratio of Exo percentage?) should be used in the BamCoverage?

Thanks a lot! Xiaoyong Fu

ADD REPLYlink written 2.5 years ago by xiaoyonf40

Hi Nicolas, I am starting to use the ChIPSeqSpike in R, but stuck in the Error: The info file should be in csv or txt format. This error came out in the quick start using spikePipe command. I appreciate your help to solve my problem.

Thanks, Xiaoyong Fu Baylor College of Medicine

ADD REPLYlink written 2.5 years ago by xiaoyonf40
gravatar for harold.smith.tarheel
4.6 years ago by
United States
harold.smith.tarheel4.6k wrote:

I'm not sure there is a good method for incorporating both normalizations, b/c they serve different functions. The spike-in is designed for global assessment of differences, while input is targeted to local differences. Spike-ins would allow you to detect an overall increase in (for example) H3K9me3 where the distribution of the mark is unchanged, whereas normalization to input by read depth would not. However, the increased read depth resulting from spike-in normalization would also be expected to produce broader peaks plus (more problematically) some number of new peaks that now exceed the statistical threshold. And, as you noted, bias exists in the input sample, so excluding that control will produce false-positive peaks in the experimental sample.

Our studies have largely involved changes in the distribution of marks, so we've always used input controls for peak calling. Perhaps users of spike-in controls will weigh in on their experiences.

ADD COMMENTlink written 4.6 years ago by harold.smith.tarheel4.6k
gravatar for Ryan Dale
4.6 years ago by
Ryan Dale4.9k
Bethesda, MD
Ryan Dale4.9k wrote:

I agree with Harold that spike-ins and inputs serve different purposes, and I don't know of any definitive answers on this. But here's some interesting reading from the authors of DESeq2, csaw, and diffBind that might give you some ideas:

The argument is that normalizing to input for the purposes of differential binding has its own set of problems that may be worse than just assuming that the input doesn't change across treatments.

Maybe you could compare the effects of normalizing for trended biases vs composition biases to see if the magnitude of the effects correspond to spike-in norm factors? In any case, it seems like csaw would be the best framework for playing around with spike-ins for normalization (based on the quality of its documentation and the sophistication of its tools).

ADD COMMENTlink written 4.6 years ago by Ryan Dale4.9k
gravatar for valentina.boeva
3.3 years ago by
valentina.boeva40 wrote:

You can try HMCan-diff, which now accepts spike-in information. HMCan-diff also removes the CG-content bias and copy number bias. The latter can it important in case if your two conditions are normal and cancer cells. Link to the HMCan paper in Nucleic Acids Research

ADD COMMENTlink written 3.3 years ago by valentina.boeva40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1063 users visited in the last hour