I am currently working on several projects using Illumina 450k DNA Methylation Microarrays. In order to detect Differentially Methylated Probes (DMP), I usually employ the Empirical Bayes-based method in the limma package. Based on the paper by Pan Du et al, I stick with M-values to fit the statistical model, and usually employ beta-values just in graphs and reports that are going to be read by fellow biologists.
In order to retain only results with a certain biological relevance, we usually apply a threshold on effect size, keeping only the probes that are significant and with a big effect size. Nothing new or fancy to this point. However, it is not uncommon to argue with fellows at the lab about the benefits or drawbacks of M-values against beta-values when trying to set a coherent effect size threshold.
Mathematical properties of M-values let us set a fixed threshold on effect size, while this is not that easy for betas. However, setting a threshold on M-values differences (say 1.4, as stated in the previous paper) usually results in a set of probes where a lot of them seem to present very small differences in beta-values (say, for example, 0.01), specially those near the minimum and maximum. This is very counterintuitive for a biologist, who is going to argue against it based on the fact that such a small difference means nothing from a biological point of view.
My mental idea of what's happening there is related to the technical bias of the 450k. This is, I try to convince people that a small difference in that region is as credible as a bigger difference in the middle region (around 0.5 in beta), due to the array design, but I do not think if I am right, or if I can even see correctly what is going on.
What do you usually do in your pipelines? Use M-values for the fit, and beta-values differences as thresholds for effect sizes? A first threshold on M-values and a second filtering step based on betas? Everything with betas? Nothing at all?
Any hint will be much appreciated.