I am analysing illumina 450K methylation data.
I have 79 samples to analyse in total.
I have been completing my analysis using all CpG probes (pre-processed for p-value etc. and probes overlapping known SNPs removed) which takes the total number of probes to ~460,000.
If I use minfi or limma to look for sites that are differentially methylated between two groups - in my case, comparing 13 samples with a mutation of interest to 66 samples without - and I look at sites/DMPs with an adjusted p-value (qvalue) of <0.05 (BH adjusted), according to the package vignette/user guide, is this corrected for multiple testing across sites (i.e. across all 460,000 probes) or across samples? or both?
I have multiple covariates included in my design matrix but only one comparison of interest (samples with or without the mutation of interest) so I am assuming I do not need to correct for multiple contrasts (limma can do this too I believe).
My concern is that a given such a large number of CpG probes, a small number of sites are bound to be differentially methylated between the two groups by chance, and I'm not entirely clear how to interpret the results, as the output (limma toptable) has many different tests and statistics such as f-statistic, pvalue, qvalue and B value etc. It seems that adjusted p value (qvalue) seems to be the one to focus on. What is the explicit interpretation of a DMP with qvalue < 0.05 - would it be 'this site is differentially methylated between samples with and without the mutation of interest when corrected for multiple testing across genes' ?
Is it worth reducing noise and potentially increasing my degrees of freedom by restricting my analysis to e.g. the most 1000 most variable sites, as might be done when analysing DEGs? Although, I haven't seen many examples of this for methylation data.
Cheers