I am integrating two data sets - gene expression and rrbs methylation from the same samples on a non-model animal - therefore promoters etc... not very well defined. I have group A I am comparing to B. I have my list of (i) differentially expressed genes and (ii) genes associated with differentially methylated CpG found between A and B. I've identified the overlap in these genes and explored gene's expression fold changes with difference in methylation levels at those CpG. I believe I am correct in using individual CpG and not averaging over whole promoter region/gene body - I just felt that this might dilute the methylation differences if any were present - especially as the promoter regions are an arbitrary length applied to all the genes - I just wondered what others thoughts were on this, as I have read papers that have looked at the CpG methylation across promoters and gene bodies as a whole I'm beginning to doubt the logic I started with?

Also when looking at correlation between CpG methylation levels and gene expression counts (normalised) - I am thinking it is best to look at it on a gene by gene basis, using the above mentioned "expression fold change vs cpg methylation difference" to pick specific candidate genes i.e. meeting minimum thresholds for both fold change and methylation difference?

I really hope this makes sense ,


Methylated promoter inhibits expression. If you examine just one individual CpG, rather than average across the promoter, how can you deduce its functionality?

But the differential methylation of just one CpG can surely alter the gene expression? It doesn’t have to be the methylation of the promoter as a whole? See -

