I have DNase-seq data for mouse embryonic stem cells (mESCs) and mouse embryonic fibroblast cells (MEFs). By their nature stem cells have a more globally accessible chromatin structure compared to somatic cells. I created a plot comparing the TSS coverage (normalised to RPM) of stem-cell specific genes and found that MEFs had much higher coverage. Is it possible that because their is less accessible chromatin in the MEFs I am seeing a higher coverage simply because it is a less complex library? Are there any normalisation methods which take into account the accessibility of the genome?
Are the reads you are normalizing to the reads from entire run or only reads within regions around TSS? You want the former not the latter since he latter will implicitly assume that total signal is the same between your two samples (when you know you should get more total signal from mESC then MEF). One way to normalize would be to identify 'background'/closed regions and try and normalize signal to that. That is what DBChIP does for ChIP-seq, I'm not sure if there have been differential DNase-seq methods developed yet or what modifications they would require vs differential ChIP-seq.
However, I don't think this will entirely solve your problem which as I understand is that the MEFs have higher coverage on the open chromatin. In ChIP-seq data you see similar issues when the quality of the antibody varies.
Best, is not to interpret the coverage but only the significant enrichments (peaks) using some peak caller.