I am integrating two data sets - gene expression and rrbs methylation from the same samples on a non-model animal - therefore promoters etc... not very well defined. I have group A I am comparing to B. I have my list of (i) differentially expressed genes and (ii) genes associated with differentially methylated CpG found between A and B. I've identified the overlap in these genes and explored gene's expression fold changes with difference in methylation levels at those CpG. I believe I am correct in using individual CpG and not averaging over whole promoter region/gene body - I just felt that this might dilute the methylation differences if any were present - especially as the promoter regions are an arbitrary length applied to all the genes - I just wondered what others thoughts were on this, as I have read papers that have looked at the CpG methylation across promoters and gene bodies as a whole I'm beginning to doubt the logic I started with?
Also when looking at correlation between CpG methylation levels and gene expression counts (normalised) - I am thinking it is best to look at it on a gene by gene basis, using the above mentioned "expression fold change vs cpg methylation difference" to pick specific candidate genes i.e. meeting minimum thresholds for both fold change and methylation difference?
I really hope this makes sense ,