I am currently working on several projects using Illumina 450k DNA Methylation Microarrays. In order to detect Differentially Methylated Probes (DMP), I usually employ the Empirical Bayes-based method in the limma package. Based on the paper by Pan Du et al, I stick with M-values to fit the statistical model, and usually employ beta-values just in graphs and reports that are going to be read by fellow biologists.
In order to retain only results with a certain biological relevance, we usually apply a threshold on effect size, keeping only the probes that are significant and with a big effect size. Nothing new or fancy to this point. However, it is not uncommon to argue with fellows at the lab about the benefits or drawbacks of M-values against beta-values when trying to set a coherent effect size threshold.
Mathematical properties of M-values let us set a fixed threshold on effect size, while this is not that easy for betas. However, setting a threshold on M-values differences (say 1.4, as stated in the previous paper) usually results in a set of probes where a lot of them seem to present very small differences in beta-values (say, for example, 0.01), specially those near the minimum and maximum. This is very counterintuitive for a biologist, who is going to argue against it based on the fact that such a small difference means nothing from a biological point of view.
My mental idea of what's happening there is related to the technical bias of the 450k. This is, I try to convince people that a small difference in that region is as credible as a bigger difference in the middle region (around 0.5 in beta), due to the array design, but I do not think if I am right, or if I can even see correctly what is going on.
What do you usually do in your pipelines? Use M-values for the fit, and beta-values differences as thresholds for effect sizes? A first threshold on M-values and a second filtering step based on betas? Everything with betas? Nothing at all?
Any hint will be much appreciated.
Hi, What the different between 450 microarray and next generating sequence ?
HI, What the difference between Macroarry 450 and next generating sequence ?
Please take a few minutes to review this post: How To Ask Good Questions On Technical And Scientific Forums It will help you formulate a proper question with sufficient detail.
Those are two completely different technologies. You should be able to search the web with "microarray" and "next generation sequencing" to find enough information.
Can we use M-value with next generating sequence data? any help please.
Yes, my answer from 4 years ago applies equally to WGBS and similar NGS experiments.
Thank you Devon Ryan
Do not add answers unless you're answering the top level question. This should be a reply to Devon's comment. Could you make the appropriate change please? That would involve the following steps:
moderateback in your answer here: A: Beta-values, M-values and thresholds on effect size