I have a list of pairs genes and I want to know if they are differentially enriched in a particularly histone mark in the same conditions. I made some research and it seems all actual software are design to compare a same gene in different condition.
So initially I though to use the number of peaks(1). Then I realise that some histones mark could be really broad and so I thought to use the sum of pb under a peaks in the gene region / length of the gene region(2). But some of the mark which interest me H3K4me2 for example are referenced as gapped peaks (encode project) it means that they could be broad and narrow. Some people tell me that in my case I should use directly the reads count and try to make a linear model based on log(ReadCountGene1 / ReadCountGene2) (3) we point out the lack of normalisation so we arrive at the conclusion to use not the actual read count but ReadCountGene1IP/ ReadCountGene1Input (4). But it seems to me those last method lack the statistically significance of peaks calling and are simply trying to get the fold enrichment which is for me a marker of the abundance of the mark in the cellular population, and add little information about the difference in term of number of peaks / region on a peaks.
I am really confused rigth now about How analysing those data and the biological relevance of each of this method.
every hint or relevant remark are welcome !