Spike-in normalization with Cut&Tag data
Entering edit mode
2.1 years ago

Hello, I am working on Cut&Tag data that includes spike-in for normalization. The spike-in is the Ampr gene from the plasmid pBluescript(+). I have 4 samples, Wt[1,2], and Ko[1,2], all of which were generated using same antibody. What I have done so far for each sample is:

  1. Aligned my reads to mm10 (Bowtie2)

  2. Aligned my reads to Ampr gene from pBluescript (Bowtie2).

I want to normalize these 4 samples using scaling factors calculated using the Spike-in data. I was wondering how to go about doing that? From my research, I found that normalization factors are calculated using the following:

normalization factor = lowest_sample (spike-in) /sample_of_interest (mm10) (https://www.biostars.org/p/247172/),

where the lowest sample is the sample with the lowest Spike-in counts and the sample_of_interest is the counts of each sample.

In this hypothetical example below, if each of these are counts from bowtie2 (PE uniquely aligned), then would the scaling method A make sense? or should I use method B or neither?

        mm10  Spike-in       Scaling Factor *A*    OR     Scaling Factor *B*
Wt1     70       5                5/5 = 1                          70/5
Wt2     80       7                5/7 = 0.7                        80/7
Ko1     30       6                5/6 = 0.8                        30/6
Ko2     40       6                5/6 = 0.8                        40/6

I would greatly appreciate advice on whether my current idea for normalization is correct or not. If not, could you point me in the right direction?

Is there a way to use deeptools to do this?

Any help will be greatly appreciated.

deeptools normalization ChIP-Seq cutandtag • 2.0k views
Entering edit mode

This is more a comment than a question, but I never really got why use of spike-ins in routine experiments would be meaningful. You add a constant amount of spike to each library, but if signal-to-noise ratio is different between libraries (in ChIP/CUT applications very common) this essentially is a normalization per library size and therefore unreliable. I would just call peaks on samples, make a count matrix on merged peaks and then use DESeq2 or edgeR to get proper size factors. ATAC-seq sample normalization

But maybe I simply do not understand the idea of spike-ins.

Entering edit mode

Login before adding your answer.

Traffic: 1735 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6