When running Mann-Whitney-U tests on count data out of feature counts, do you know if is correct to converted to pseudo-counts (however you want to do that, x +1 etc) or leave the zeros in? Previously I removed the zeros altogether when comparing counts over specific region of the genome for different sets of genes but then any difference I see is only applicable to those genes that have reads in those regions at all, and one of the sets of genes in the comparison could have many genes with no reads in the region of interest, so when you take this into account, the set which appeared to have a higher number of reads could actually be depleted of reads overall.
Please add details. Which kind of data do you have? How are the sample sizes? What do you compare? How did you normalize? If this is non-single-cell NGS data then the answer is probably something like "use DESeq2, edgeR, limma"
The data is ChIP-seq data performed in duplicate. I'm comparing read counts over specific regions of the genome BETWEEN different sets of genes within conditions rather than between conditions, which is why I haven't used DESeq2/edgeR/limma. I normalised by the number of mapped reads.
I think for this to be meaningful you would also need to correct for mappability and GC content. Different regions may have strikingly different counts simply because GC bias and and uniqueness of the region cause this difference rather than biology.
Even after correcting you will still have to show that more reads (or whatever score you end up with) = more protein bound to DNA using low-level, gold standard methods. I recall a paper doing this with RNAseq, it's not a perfect correlation but it works overall. I think RNAseq it much easier than ChIP-seq, the biological interpretation is broader.
Any advice on how to correct for these things? Can I use the inputs or IgG samples to correct for mappability by dividing by the counts for either of these samples?
You do a lot of black magic until you plot the numbers you get against the G/C content and see a line that looks straight. Don't divide counts unless they are closer to normal distribution than Poisson.