Plausibility of performing t-test on methylation beta values.
Entering edit mode
4.0 years ago

Can one compare Beta values of methylation, such as those found at TCGA, using t-Test? This qutation is from from: An evaluation of statistical methods for DNA methylation microarray data analysis, BMC Bioinformatics. 2015; 16: 217.

Currently available methylation differential analysis methods implemented in Bioconductor/R include several approaches such as Wilcoxon rank sum test (used in methyAnalysis package), t-test (used in methyAnalysis, CpGAssoc, RnBeads, and IMA package), Kolmogorov-Smirnov Tests (although not implemented in packages, but used by some investigators [10]), permutation test (used in CpGAssoc package), empirical Bayes method (used in RnBeads, IMA and minfi package), and bump hunting method (used in bumphunter and minfi package).

What I mean is, if beta values of condition a are c(0.5,0.5,0.5,0.50001) are these considered different from c(0.51,0.51, 0.51, 0.51)?


> t.test( c(0.5,0.5,0.5,0.50001), c(0.51,0.51, 0.51, 0.51))$p.value
[1] 3.44839e-11

but we know that beta value of 0.5 itself means data are heterogeneous??

DNA methylation beta value statistical test • 1.5k views
Entering edit mode
4.0 years ago

Well, the t-test null hypothesis in this case is that the means are not different. The p-value indicates that we can reject this hypothesis and conclude that the possibility exists that the means may indeed be different. Looking at the values, the human brain can easily see that the means are indeed different.

The Wilcoxon test, which is what I use for methylation analysis for 'paired' samples, luckily gives a warning, saying that it cannot calculate the exact p-value (sorry, I use R in Portuguese):

wilcox.test( c(0.5,0.5,0.5,0.50001), c(0.51,0.51, 0.51, 0.51))

    Wilcoxon rank sum test with continuity correction

data:  c(0.5, 0.5, 0.5, 0.50001) and c(0.51, 0.51, 0.51, 0.51)
W = 0, p-value = 0.01771
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(c(0.5, 0.5, 0.5, 0.50001), c(0.51, 0.51,  :
  não é possível computar o valor de p exato com o de desempate

Whether this is important in the context of methylation beta values is another question. As they are measured on the scale 0 to 1, perhaps some other test is more appropriate. However, you can counteract the issues like the one to which you have alluded by, in addition, calculating a difference in population mean.

mean(c(0.5,0.5,0.5,0.50001)) - mean(c(0.51,0.51, 0.51, 0.51))
[1] -0.0099975

So, that particular probe would be rejected based on this small difference.

...but, who knows, if we measured beta values on the scale 0 to 1 000 000, would your values then look a a lot 'better' (?). The difference is technically ~1% in the level of methylation, which may or may not have clinical relevance.


Entering edit mode

Thank you Kevin Blighe. I'll go for non-parametric tests to compare beta values. Another question: is there a threshold (like |logFC|>1 in determining DEGs) used for diffrentially methylated regions/genes or does statistical tests suffice? As when I analyse methylation data with GEO2R I get logFC among other columns.

Entering edit mode

I have seen people use a difference in mean of 0.1, 0.15, and 0.2. There is no standard, though. You should definitely filter on both:

  1. p-value from statistical test
  2. difference in mean

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6