I have many human ChIP-seq samples which I need to compare. All of them received the same 10% spike-in of mouse chromatin. At first I was going to simply normalize each sample to the total number of spike-in reads. However I've been asked to do something a little more sophisticated now and need some advice.
I have used a method called Irreproducible Discovery Rate (IDR) to find ChIP-seq peaks (called by MACS2) that are very reproducible and high confidence. I need to only use these confident peak regions for my normalization. As in, I have counted up total mouse spike-in reads that fall into the confident peak regions for each sample. So for example for one antibody I have this scenario:
Sample Reads within confident peaks Wildtype rep. 1 1.02 million Wildtype rep. 2 0.78 million Mutant rep. 1 1.01 million Mutant rep. 2 0.60 million
What is the best way to normalize the samples in this case? And should I still be normalizing to RPM prior to the spike-in scaling factor? I am thinking probably I should skip the RPM step and just use the spike-in. Hopefully this question makes sense to someone. Thanks