I have data from bisulfite converted epigenome wide probe capture for individual CpG sites. This data was generated using MethylKit and each individual CpG site has a coordinate position, with a differential methylation score, p-value and q-value. I would like to use this list of individual differentially methylated CpGs to identify a set of differentially methylated genes and then from there, perform pathway analysis.
Previously I had been using Homer to annotate each CpG to give me a gene name and associated gene information (function, distance to TSS, location in the gene, etc.) and I had been satisfied thus far with that process. From there I had my list of differentially methylated genes and had been using that to perform pathway analysis. However now that I am trying to perform pathway analysis, I'm concerned I am not accurately representing my data. For over-representation based tools, I suppose it's not a large concern as their analyses focus on identifying pathways that are enriched for individual genes. However with the newer methods that use an expression score to generate a ranked list, what is the best way to sum up multiple CpG differential methylation scores to represent a single gene?
Is there a method to account for the different numbers of CpGs and varying levels of each CpG site for each gene? I am primarily concerned with the promoter regions, should I just tile the entire promoter region (say -2000 to TSS) and use that value for methylation? Is there a way to represent the effect of each CpG site when calculating a differential methylation score for a gene?
Thanks