Question

Seeking feedback on ChIP-seq normalization method: Calculating scaling factors by dividing input by IP, including spike-in

3

Entering edit mode

8 months ago

kalavattam ▴ 190

I've been discussing ChIP-seq normalization methods with colleagues, and they introduced me to a normalization approach that involves calculating scaling factors by dividing a ratio of input spike-in counts by a ratio of IP spike-in counts. This method is new to me, and I'm trying to gain a better understanding of its validity and potential use in ChIP-seq analyses.

Specifically, we take the following steps for each sample's input and immunoprecipitate (IP), both of which have been spiked with chromatin from a different species:

First, tally the numbers of genome-wide quality-checked (QC'd) alignments for input, immunoprecipitate (IP), spike-in input, and spike-in IP

## R pseudocode ##
 main_input <- # Integer value of QC'd genome-wide alignments for "main" input
    main_IP <- # Integer value of QC'd genome-wide alignments for "main" IP

spike_input <- # Integer value of QC'd genome-wide alignments for spike-in input
   spike_IP <- # Integer value of QC'd genome-wide alignments for spike-in IP

These alignment counts represent the raw data for each sample.

Second, calculate the ratio of spike-in input to "main" input, and calculate the ratio of spike-in IP to "main" IP

## R pseudocode ##
ratio_input <- spike_input / main_input
   ratio_IP <- spike_IP / main_IP

The assumption here is that the spike-in controls are proportional to the actual chromatin content and should ideally represent similar scaling factors.

Third, for the sample IP, calculate a scaling factor by dividing the input ratio by the IP ratio

## R pseudocode ##
      sf_IP <- ratio_input / ratio_IP

This scaling factor aims to correct for potential differences in IP efficiency and sequencing depth between the input and IP samples.

Fourth, for the sample input, set the scaling factor to 1

## R pseudocode ##
   sf_input <- 1 # (i.e., from ratio_IP / ratio_IP)

Since the scaling factor for the input is calculated as 1, the input coverage will remain unchanged after scaling.

Finally, using deepTools bamCoverage, compute IP coverage by multiplying by the IP scaling factor

## shell pseudocode ##
main_IP_bam= # QC'd IP alignments to "main" genome (spike-in IP alignments have been excluded)
   bin_size= # For example, 1
      sf_IP= # Value calculated in step #3
 main_IP_bw= # Bigwig file of scaled coverage 

bamCoverage \
    -b "${main_IP_bam}" \
    --binSize "${bin_size}" \
    --scaleFactor ${scal_fctr} \
    -o "${main_IP_bw}"

This should theoretically normalize the IP coverage based on the spike-in controls.

I've done some searching, but so far I couldn't find any publications or resources that describe this particular normalization method. I'm curious to know if any of you have come across or used this approach in your ChIP-seq analyses. Are there any potential benefits or limitations associated with this method?

I will appreciate any insights and feedback on this topic, which will help me make informed decisions about normalization approaches for my ChIP-seq data. Thank you in advance for your time and assistance.

coverage spike-in scaling-factor ChIP-seq • 2.3k views

ADD COMMENT • link 7 months ago by kalavattam ▴ 190

score 3 · Accepted Answer · 2023-08-18

3

Entering edit mode

8 months ago

jared.andrews07 ★ 16k

This is largely how our group uses them to account for genome-wide shifts, as done in this study. If we don't normalize the tracks this way, they look largely identical, as the binding profiles are largely similar, just very diminished genome-wide.

We also normalize the scaling factors to each other so that the highest scaling factor is set to 1. It feels better to scale down data than to scale up:

Calculate the percent drosophila reads in the IP (from the read counts with duplicates removed)
Calculate the percent drosophila reads in the input (from the read counts with duplicates removed)
input fly%/ChIP fly% = scaling factor
For groups of samples that are being compared, all the scaling factors can be normalized by dividing by the highest or lowest scaling factor - this choice is made depending on how the downstream application will use the scaling factor. Ideally you should scale files down, not falsely inflate them. (For deeptools bamCoverage, you should divide all scaling factors by the highest scaling factor).

An example calculation:

Determine % of fly reads for each ChIP and input:

ChIP 1: human reads 30,000,000 fly reads 700,000 700,000/30,700,000 = 2.28% fly reads

Input 1: human reads 10,000,000 fly reads 400,000 400,000/10,400,000 = 3.85% fly reads

ChIP 2: human reads 40,000,000 fly reads 3,000,000 3,000,000/43,000,000 = 6.98% fly reads

Input 2: human reads 10,000,000 fly reads 450,000 450,000/10,450,000 = 4.31% fly reads

ChIP 3: human reads 25,000,000 fly reads 1,000,000 1,000,000/26,000,000 = 3.85% fly reads

Input 3: human reads 10,000,000 fly reads 380,000 380,000/10,380,000 = 3.66% fly reads

Determine scaling factor for each ChIP (input DM%/ChIP DM% = scaling factor:

ChIP 1 scaling factor = 3.85/2.28 = 1.69

ChIP 2 scaling factor = 4.31/6.98 = 0.62

ChIP 3 scaling factor = 3.66/3.85 = 0.95

Normalized scaling factors by dividing all scaling factors by highest scaling factor (this used for bamCoverage - if the scaling factor is in the denominator instead, scale by the lowest scale factor):

ChIP 1 normalized scaling factor = 1.69/1.69 = 1

ChIP 2 normalized scaling factor = 0.62/1.69 = 0.367

ChIP 3 normalized scaling factor = 0.95/1.69 = 0.562

ADD COMMENT • link 8 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thank you for this. In your example, you divide the fly counts by all counts (i.e., human plus fly counts) rather than human counts alone, which was what my colleagues described. Is there a reason you use the former denominator over the latter?

Because the spike-in alignment counts are low to begin with, the resulting scaling factors are similar if using (a) only human counts or (b) all counts for the denominator.

(a) Using human counts in the denominator

Determine % of fly reads for each ChIP and input:

| Sample  | Human Reads | Fly Reads | Calculation            | % Fly Content |
|---------|-------------|-----------|------------------------|---------------|
| ChIP_1  | 30,000,000  | 700,000   | 700,000 / 30,000,000   | 2.33%         |
| Input_1 | 10,000,000  | 400,000   | 400,000 / 10,000,000   | 4.00%         |
| ChIP_2  | 40,000,000  | 3,000,000 | 3,000,000 / 40,000,000 | 7.50%         |
| Input_2 | 10,000,000  | 450,000   | 450,000 / 10,000,000   | 4.50%         |
| ChIP_3  | 25,000,000  | 1,000,000 | 1,000,000 / 25,000,000 | 4.00%         |
| Input_3 | 10,000,000  | 380,000   | 380,000 / 10,000,000   | 3.80%         |

Determine scaling factor for each ChIP (input fly % / ChIP fly % = scaling factor):

| Sample  | Calculation | Scaling Factor |
|---------|-------------|----------------|
| ChIP_1  | 4.00 / 2.33 | 1.72           |
| ChIP_2  | 4.50 / 7.50 | 0.600          |
| ChIP_3  | 3.80 / 4.00 | 0.950          |

Normalize scaling factors by dividing all scaling factors by highest scaling factor (this is used for bamCoverage—if the scaling factor is in the denominator instead, scale by the lowest scale factor):

| Sample  | Calculation | Normalized Scaling Factor |
|---------|-------------|---------------------------|
| ChIP_1  | 1.72 / 1.72 | 1.00                      |
| ChIP_2  | 0.60 / 1.72 | 0.349                     |
| ChIP_3  | 0.95 / 1.72 | 0.552                     |

(b) Using all counts in the denominator

Determine % of fly reads for each ChIP and input:

| Sample  | Human Reads | Fly Reads | Calculation          | % Fly Content |
|---------|-------------|-----------|----------------------|---------------|
| ChIP_1  | 30,000,000  | 700,000   | 700,000/30,700,000   | 2.28%         |
| Input_1 | 10,000,000  | 400,000   | 400,000/10,400,000   | 3.85%         |
| ChIP_2  | 40,000,000  | 3,000,000 | 3,000,000/43,000,000 | 6.98%         |
| Input_2 | 10,000,000  | 450,000   | 450,000/10,450,000   | 4.31%         |
| ChIP_3  | 25,000,000  | 1,000,000 | 1,000,000/26,000,000 | 3.85%         |
| Input_3 | 10,000,000  | 380,000   | 380,000/10,380,000   | 3.66%         |

Determine scaling factor for each ChIP (input fly % / ChIP fly % = scaling factor):

| Sample  | Calculation | Scaling Factor |
|---------|-------------|----------------|
| ChIP_1  | 3.85/2.28   | 1.69           |
| ChIP_2  | 4.31/6.98   | 0.62           |
| ChIP_3  | 3.66/3.85   | 0.95           |

Normalize scaling factors by dividing all scaling factors by highest scaling factor (this is used for bamCoverage—if the scaling factor is in the denominator instead, scale by the lowest scale factor):

| Sample  | Calculation | Normalized Scaling Factor |
|---------|-------------|---------------------------|
| ChIP_1  | 1.69/1.69   | 1.00                      |
| ChIP_2  | 0.62/1.69   | 0.367                     |
| ChIP_3  | 0.95/1.69   | 0.562                     |

ADD REPLY • link 8 months ago by kalavattam ▴ 190

1

Entering edit mode

Thank you for this. In your example, you divide the fly counts by all counts (i.e., human plus fly counts) rather than human counts alone, which was what my colleagues described. Is there a reason you use the former denominator over the latter?

Well, the human and fly reads are still from one sample, so it makes sense to take them both into account to address total read depth differences (as one sample could have more many more fly reads than another if the global shift is large). We can get pretty dramatic shifts where fly counts will be >5x higher in a sample with widespread depletion than a wildtype because the IP doesn't work as well due to much less protein. That represents a biological shift and is information that shouldn't be tossed out.

In this example, it doesn't make much of a difference in the final number, but in some experiments, I expect it'd be a more significant shift. In your example, you can already see a larger impact on samples with a greater proportion of fly reads - change of 0.018 for ChIP_2, 0.01 for ChIP_3.

I can't think of any compelling reasons to not include the fly reads - it may be worth asking your colleagues why they do not.

ADD REPLY • link 8 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thank you, makes sense.

In this example, it doesn't make much of a difference in the final number, but in some of experiments, I expect it'd be a more significant shift. In your example, you can already see a larger impact on samples with a greater proportion of fly reads - change of 0.018 for ChIP_2, 0.01 for ChIP_3.

I can't think of any compelling reasons to not include the mouse reads - it may be worth asking your colleagues why they do not.

Good point, I will do so.

ADD REPLY • link 8 months ago by kalavattam ▴ 190

0

Entering edit mode

Thanks again, jared.andrews07

In this example, it doesn't make much of a difference in the final number, but in some experiments, I expect it'd be a more significant shift. In your example, you can already see a larger impact on samples with a greater proportion of fly reads - change of 0.018 for ChIP_2, 0.01 for ChIP_3.

In your previous work, did you happen to work with any ChIP-seq datasets where this shift is demonstrable?

One of the goals of my current work is—in addition to standard analyses—to provide my colleagues with a kind of tutorial for basic ChIP-seq analyses. I'd like to present and demonstrate the point that, with this approach to normalization, it's preferable to divide by the total counts (i.e, the "main" plus spike-in counts) rather than just the "main" counts to address read-depth differences. The shift is not apparent with the datasets I'm currently working with—i.e., dividing by either total or "main" yields similar scaling coefficients. I want to clearly show this because some colleagues have been dividing by only "main" counts and getting (apparently) reasonable results; I want to make the point that more extreme circumstances will yield inappropriate scaling coefficients.

So, if you happen to have encountered or know of any publicly available datasets useful for this purpose, would you mind to point me to their papers or accession numbers?

ADD REPLY • link 7 months ago by kalavattam ▴ 190

1

Entering edit mode

The study I linked to in my answer has some ChIP-seq data that will have some skew, as it contains data for a mutation that ablates H3K27me3 across the genome (with focal retention at certain loci, so not completely gone). So the samples with the mutation tend to have more fly reads than those without, as the IP just doesn't pull down as much.

If a rather extreme toy example is good enough, let's use the calculations in your comment above. I kicked the ChIP_2 fly reads up to 9 million, and using all reads results in a normalized scaling factor of 0.144 while using only the human reads results in 0.116. Not a huge absolute difference, but noticeable compared to the difference between using all reads and only human reads seen for ChIP_3 (0.028 for ChIP_2 versus 0.01 for ChIP_3).

ADD REPLY • link 7 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

Sounds good, thank you so much!

ADD REPLY • link 7 months ago by kalavattam ▴ 190

0

Entering edit mode

Perhaps this question is better suited for a new post, but since it follows on this work, I will post it here for now and move it if requested.

If I wanted to use DESeq2 for a differential binding analysis by generating a matrix of ChIP counts per n-bp bins, for example:

| chr   | start | stop  | ChIP_1_WT | ChIP_2_WT | ChIP_3_KO | ChIP_4_KO |
|-------|-------|-------|-----------|-----------|-----------|-----------|
| chr1  | 1     | 150   | 300       | 400       | 1200      | 1800      |
| chr1  | 151   | 300   | 250       | 433       | 1000      | 1100      |
| ...   | ...   | ...   | ...       | ...       | ...       | ...       |
| chrX  | 45001 | 45150 | 15000     | 18000     | 5500      | 4500      |
| chrX  | 45151 | 45300 | 12500     | 17500     | 5000      | 4500      |
| ...   | ...   | ...   | ...       | ...       | ...       | ...       |

Is it reasonable for me to supply the scaling factors calculated by the above-described method rather than use the values generated by DESeq2::estimateSizeFactors() (using either the human "main" counts or the fly spike-in counts)?

ADD REPLY • link 8 months ago by kalavattam ▴ 190

1

Entering edit mode

I'd probably use DiffBind and use the spike-ins in the normalization function (it uses DESeq2 on the backend). It limits analysis to peak regions though, so depending on what you're trying to do, may have to adjust.

ADD REPLY • link 8 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

Hi jared.andrews07 — thanks again for your advice and guidance.

A couple quick follow-up questions regarding these points:

For groups of samples that are being compared, all the scaling factors can be normalized by dividing by the highest or lowest scaling factor...

...

Normalized scaling factors by dividing all scaling factors by highest scaling factor (this used for bamCoverage - if the scaling factor is in the denominator instead, scale by the lowest scale factor):

ChIP 1 normalized scaling factor = 1.69/1.69 = 1

ChIP 2 normalized scaling factor = 0.62/1.69 = 0.367

ChIP 3 normalized scaling factor = 0.95/1.69 = 0.562

#1 — If we wanted to plot input coverage alongside ChIP coverage, would you also scale input down in this manner?

For example...

Input 1 normalized scaling factor = 1/1.69 = 0.592
Input 2 normalized scaling factor = 1/1.69 = 0.592
Input 3 normalized scaling factor = 1/1.69 = 0.592

#2 — And if we wanted to examine the log2 ratios of ChIP to input with, e.g., bigwigCompare, we'd use the scaled-down ChIP and input bigwigs? (However, to identify these kinds of positional changes in signal with confidence, I think a statistical approach (e.g., via diffBind, DESeq2, csaw, etc.) is better suited than visualizing ratios of ChIP coverage to input coverage—but I'm asking because this is what my bench biologist colleagues are used to.)

ADD REPLY • link 8 months ago by kalavattam ▴ 190

1

Entering edit mode

I typically don't bother normalizing the input tracks, so kind of your call. I suppose I would scale them similarly.
I do the comparisons with the tools you mentioned to find actual differential loci, then use the normalized tracks if I need to actually show the profile/tracks in a figure. I don't do direct comparison of the tracks - they are just a nice visual aid to go along with the statistical methods.

ADD REPLY • link 8 months ago by jared.andrews07 ★ 16k