Beta-values, M-values and thresholds on effect size
2
2
Entering edit mode
8.6 years ago
gbayon ▴ 170

Hi everybody,

I am currently working on several projects using Illumina 450k DNA Methylation Microarrays. In order to detect Differentially Methylated Probes (DMP), I usually employ the Empirical Bayes-based method in the limma package. Based on the paper by Pan Du et al, I stick with M-values to fit the statistical model, and usually employ beta-values just in graphs and reports that are going to be read by fellow biologists.

In order to retain only results with a certain biological relevance, we usually apply a threshold on effect size, keeping only the probes that are significant and with a big effect size. Nothing new or fancy to this point. However, it is not uncommon to argue with fellows at the lab about the benefits or drawbacks of M-values against beta-values when trying to set a coherent effect size threshold.

Mathematical properties of M-values let us set a fixed threshold on effect size, while this is not that easy for betas. However, setting a threshold on M-values differences (say 1.4, as stated in the previous paper) usually results in a set of probes where a lot of them seem to present very small differences in beta-values (say, for example, 0.01), specially those near the minimum and maximum. This is very counterintuitive for a biologist, who is going to argue against it based on the fact that such a small difference means nothing from a biological point of view.

My mental idea of what's happening there is related to the technical bias of the 450k. This is, I try to convince people that a small difference in that region is as credible as a bigger difference in the middle region (around 0.5 in beta), due to the array design, but I do not think if I am right, or if I can even see correctly what is going on.

What do you usually do in your pipelines? Use M-values for the fit, and beta-values differences as thresholds for effect sizes? A first threshold on M-values and a second filtering step based on betas? Everything with betas? Nothing at all?

Any hint will be much appreciated.

DNA Methylation Microarray 450k Illumina • 9.5k views
0
Entering edit mode

Hi, What the different between 450 microarray and next generating sequence ?

0
Entering edit mode

HI, What the difference between Macroarry 450 and next generating sequence ?

1
Entering edit mode

Please take a few minutes to review this post: How To Ask Good Questions On Technical And Scientific Forums It will help you formulate a proper question with sufficient detail.

Those are two completely different technologies. You should be able to search the web with "microarray" and "next generation sequencing" to find enough information.

0
Entering edit mode

Hi all,

Can we use M-value with next generating sequence data? any help please.

Thanks

0
Entering edit mode

Yes, my answer from 4 years ago applies equally to WGBS and similar NGS experiments.

0
Entering edit mode

Thank you Devon Ryan

0
Entering edit mode

Do not add answers unless you're answering the top level question. This should be a reply to Devon's comment. Could you make the appropriate change please? That would involve the following steps:

1. Copy the contents of your reply from this answer (you can edit this answer - Ctrl/Cmd + click the link to open it in a new tab and do a Select All -> Copy there).
2. Click on "Add Reply" on Devon 's comment here: C: Beta-values, M-values and thresholds on effect size
3. Paste the copied text
5. Click on moderate back in your answer here: A: Beta-values, M-values and thresholds on effect size
6. Choose Delete Post
7. Click on the blue Submit button.

Thank you!

2
Entering edit mode
8.6 years ago

Using M-values for statistics is by far the best way to go.

Regarding the utility of beta values, it's good to filter by those as well. Why? Because a 1% change in methylation is unlikely to be biologically meaningful. I've seen a number of papers publish such changes, but it's always interesting to note that they never bother to show functional relevance (probably because there is none). If you can show the biological relevance of such small changes then go ahead and follow up on them. Personally, I want to prioritize the results on the likelihood that it's causing some phenotype and using methylation changes for that (and only that) makes sense.

Edit: There's corollary to other types of data. For example, we can often detect small changes in highly-expressed genes when we do RNAseq. Some of these are meaningful, but the really small changes aren't. I wouldn't toss these results, but I also wouldn't recommend anyone pursue them first when doing a follow-up.

0
Entering edit mode

I agree with you. I think the architecture already has enough technical noise per se, as to believe those changes with minimum beta differences really have any relevance at all. I was planning to combine two decisions: A) If a change is significant or not, using the adjusted p-values from testing on M-values and B) if the change has some sense, using the differences in beta values. Do you think an additional threshold on M-values difference would mean anything at all? Because I have seen that a 0.2 beta difference implies a minimum approximated difference in M of 1.17, around the 0.5 beta value. I think that could be coherent, but I just wanted to comment it.

0
Entering edit mode

Your approach seems pretty reasonable to me :)

Regarding using the M-values differences as a threshold, my guess is that this ends up being related to filtering by p-value (except that the p-value incorporates the reliability of the M-value difference as well). I doubt that'll hurt anything, but I wouldn't expect to gain too much. Having said that, it's been forever since I've had a dataset like this so if you have one that contradicts my guess then please ignore this :)

0
Entering edit mode

Thank you Devon Ryan.

0
Entering edit mode

@DevonRyan @gbayon How can I take into account the effect size using M values and then filter by that got with Beta values? In my case, using Beta values I get log2 fold changes between -0.3 and 0.4, while using M values I get log2 fold changes ranging from -3 to 4. So, at first I can get just those differencially methylated positions with a significant adjusted p value obtained using M values. Those differencially methylated positions present a high fold change so they seem biologically significant.

But then, when I perform the same analysis using beta values, the log2 fold changes range from -0.3 to 0.4, so they seem actually a little variation.

Do you think If I get just those probes with an abs(log2FC) > 1 using M values should be ok? I am worried about the low abs(log2FC) when using Beta values instead.

0
Entering edit mode

Please post this as a new question rather than as a comment on a 6 year old post.

0
Entering edit mode
8.6 years ago

I have been using the minfi package to process 450k arrays, including the differential methylation part. Following the package vignettes I used the M values to detect differential probes. (With two conditions, minfi applies an f-test to detect differential probes, so not too different from the limma package). For reporting differential methylation, I averaged beta values across the two conditions and reported the difference of the averages. That's how I've done it... I'm interested in comments & opinions myself...

0
Entering edit mode

I am also a big fan of the minfi package. I think it is great. And yes, I have also done the same processing as you did. I think it is quite common. I opened this post to see if I could get some insights or advice, because sometimes it is easy to screw things, although I know this is a fairly simple question.