Question: When Normalizing Chip-Seq Replicates, Is It Better To Normalize All Reads, Or Only Reads In A Window Of Interst?
gravatar for bede.portz
6.9 years ago by
United States
bede.portz490 wrote:

I want to determine the best way to normalize ChIP-seq replicates that differ in total reads. I am analyzing ChIP-seq data for a factor that is found near transcription start sites (TSSs), and focusing my analysis on a relatively small window around TSSs. Such experiments may yield 10 million mappable reads, but only <1million map to a window around TSSs in which I am interested, say +/- 1000bp. I want to normalize for differences in tag counts between technical replicates, and replicates generated from different conditions.

It seems I could normalize by total reads between replicates, i.e. make the total reads in each replicate equal to 10 million and proceed to mapping the reads to a window around TSSs. This method actually takes ALL the data, greater than >90% of which I will discard early in the analysis, for normalization. So I would be normalizing the signal of interest by what amounts to a great deal of excess noise.

Alternatively, I could map the reads to the TSSs within the window of interest, and normalize the data that lies within this window. In this case I can first see how the proportion of tags within the window for each replicate compares to the total number of tags in each replicate. If an equal proportion of total reads from each replicate maps to +/- 1kb around TSSs, both methods should yield similar results. However to me it makes sense to refine the data first, isolating those data you will ultimately analyze, than do the normalization between replicates to adjust for read counts. Especially for cases where the biology predicts replicates from different cellular conditions will differ in a narrow window around a subset of genes: a small percentage or a large dataset.

Does a consensus exist as to the best approach?



normalization chip-seq rna-seq • 3.0k views
ADD COMMENTlink modified 6.9 years ago by Ying W4.0k • written 6.9 years ago by bede.portz490
gravatar for Ying W
6.9 years ago by
Ying W4.0k
South San Francisco, CA
Ying W4.0k wrote:

I don't think a consensus exists.

I've had good experience normalizing to all reads (raw library size or full library size) in conditions where the total amount of protein bound is changing. Ideally, reads in window (effective library size) should give similar results to all reads and this has been the case for some other chip-seq datasets I've analyzed. Since you pre-define your regions (+/- 1kb around TSS), you do not have the complication of different number of binding regions between samples.

ADD COMMENTlink written 6.9 years ago by Ying W4.0k

Thanks Ying. This has been my experience as well, but my experience is extremely limited, so I thought I would pose the question.

ADD REPLYlink written 6.9 years ago by bede.portz490
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2004 users visited in the last hour