I want to normalize data for enrichment based method (eg., MeDIP). MeDIP - captures methylated DNA sites.
Lets say that I have two samples: real sample (targets modified DNA) and dummy control (reads are randomly distributed along the genome). Number of reads in real sample is N times greater than number of reads in dummy control sample.
My question is: should I normalize number of reads between two samples?
case A: On one hand, it is logical to normalize number of reads as I will probably want to compare mean coverage between my samples. In this case, I can divide coverage per CG by total number of reads.
case B: On the other hand, maybe lower number of reads in dummy control is a result of a biological process (eg., in control sample there are no methylated DNA sites, thus no targets to be enriched and that's why we are getting much lower number of reads for this sample).
I know that a common strategy is to normalize number of reads. But what if different number of reads is a biological result? Can we know this? I am interested how community is dealing with this kind of a problem.