I was working on DNA methylation data downloaded from TCGA portal. Specifically I was working on Illumina 450K array. I am using minfi package for my analysis. I initally used dmpfinder to identify cpg sites but I get more than 100k sites which is way too much for my analysis. I also looked at bumpFinder which gives differentially methylated regions(DMRs). Problem is I struggle with interpreting DMRs. Specifically I want to feed them to my Machine Learning pipeline. I am struggling to quantify a DMR. Any methods out there would be really helpful.
I also recently analysed methylation data from the TCGA and I just applied a Wilcoxon Signed Rank Test (in R) to each probe, adjusting P values by FDR. I also calculated the difference in mean β value between the probe in tumours and normals.
A statistically significant probe was then defined as one that passed 5% FDR and had a difference in mean β value of >|0.15| (absolute 0.15)
Between tumour and normal using the 450K chip, this gives me ~75,000 significant probes, which I would expect between tumour and normal. If I go to 1% FDR and difference in mean β>|0.60|, I get just 303 probes.