I have MBD-seq datasets of Dnmt1 Knock-Out and control cells.
As expected, Dnmt1 KO covered very few genomic regions compared to control sample since no Dnmt1 presented in the KO. The problem is that the reads in the KO sample were concentrated in the "few genomic region" made the intensity too high.
What I'm curious here is how I should normalize such data which the samples are expected to be different overall reads?
For example, if KO and control have 100 and 10,000 detected bins (> 0 reads), and each of them has a million total reads, each bin in KO will 100 times higher reads leading to the biased quantification of MBD-seq enrichment.
Would it be OK if I subset 1/100 of the reads from a KO sample to compensate the differences?
How do everyone think?