I am currently working on normalizing some ChIP-seq data I've generated to a spike-in control. The ChIP was performed in mouse cells and our spike-in was human chromatin.
I've referenced the active motif kit instructions on how to do this (https://www.activemotif.com/catalog/1091/chip-normalization) and I've talked to a few different individuals, but am still not sure exactly how to perform this normalization.
I've tried several different methods, but here is the general workflow:
Align to a merged genome containing both mouse and human chromosomes.
Pull out all alignments to the human genome and count them.
Use the sample with the smallest number of reads mapping to the human genome to create a normalization factor. For example:
- Sample 1 (1,000,000 reads)
- Sample 2 (2,000,000 reads) | Normalization Factor = 1,000,000/2,000,000 = 0.5
- Sample 3 (3,000,000 reads) | Normalization Factor = 1,000,000/3,000,000 = 0.33
Using the reads that aligned to the mouse genome, pull the subset of reads designated by the normalization factor and map these. I have been subsetting reads using the samtools -s option. For example:
- Sample 1 (10,000,000 mouse reads) --> Map all 10,000,000
- Sample 2 (30,000,000 mouse reads) --> Map 50% of these, or 15,000,000
- Sample 3 (60,000,000 mouse reads) --> Map 33% of these, or 20,000,000
Create a bed file from the sam file and extend the reads.
Use this bed file to generate a bigwig file for viewing data.
To be more specific, I have a WT line, a heterozygous knockout, and a homozygous knockout of a protein and I've done ChIP for that protein in each line. However, when I normalize using the aforementioned method, the homozygous knockout has higher signal than the wild type at binding sites of the protein that is knocked out.
Does my method sound correct? Can anyone provide a script or instructions on how exactly to perform a spike-in normalization?
Thanks for all your help!