Question: Statistical test for DMR annotation?
gravatar for Amit Lavon
5.6 years ago by
Amit Lavon10
Amit Lavon10 wrote:

Hello friends,

I see that it was discussed here: A: Dmr (Differentially Methylated Regions) Identification Software but I would like to dig a little deeper into that, because I couldn't find a satisfying answer yet.

So - what statistical test would you choose for DMR (differentially methylated regions) annotation? Meaning you have a 2X2 table with column labels `WT` and `mutant`, and row labels `methylated` and `not methylated`, each cell has a count for a single region. You need to test whether methylation is dependent on the mutation.

I see that `methylkit` uses Fisher's Exact Test, but that test doesn't make sense to me. Why would DMR's behave hyper-geometrically? This assumes that the background set from which you sample is finite, right? And that's not the case with methylation - you can (theoretically) sample as much as you want, like coin flipping.

Am I right? What test would you use?

Thanks a lot, Amit

statistics dmr methylation • 3.2k views
ADD COMMENTlink modified 3.0 years ago by jordi0 • written 5.6 years ago by Amit Lavon10

If the no-replacement aspect of Fisher's test is what you don't like then just do a binomial test instead. Having said that, the two approach each other with increasing N. Having said that, Charles' answer makes much more sense than a Fisher's or binomial test.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Devon Ryan96k

Thank you Devon.

What do you think is the appropriate test for DMRs with a fixed-size window?


ADD REPLYlink written 5.6 years ago by Amit Lavon10
gravatar for Charles Warden
5.6 years ago by
Charles Warden7.8k
Duarte, CA
Charles Warden7.8k wrote:

I think that there are two types of DMR calculations: those with predefined region boundaries and those without predefined region boundaries.

If you have a predefined window (such as pre-defined regions of interest on the 450k array, targeted BS-Seq, or any sliding-window based analysis), I think the main trick is the summarization (at least that is my opinion). For example, COHCAP will either average the signal across CpG sites or CpG islands, and then use a simple statistical test like an ANOVA on the continuous signal (in addition to using additional filters to try and reflect the fact that the original signal can likely be thought of as a discrete variable where each CpG site is either homozygous methylated, homozygous unmethylated, or heterozygyous). methylKit and IMA also fall in this category. So, the short answer is that you may be able to use one of those other tools (or a similar strategy), but I think there people out there that are statisfied with the methylKit results.

DMR tools without predefined boundaries (such as bumphunter in the minfi package or ChAMP) are a totally different beast. A Fisher's Exact Test is unquestionably inappropriate in this situation.

If it helps, there are some script templates and limited benchmarks for a few such programs:

However, the original question was specifically for WGBS data (whereas the links above are for 450k data). Here, methylKit and bsseq are the main options that I know about. MethylSig is another option that I have heard about but not yet tried:

ADD COMMENTlink modified 10 months ago by RamRS28k • written 5.6 years ago by Charles Warden7.8k

Thank you for the detailed answer.

My question is more on the validity of specific tests on the case of DMRs.

What do you think is the appropriate test for DMRs with a fixed-size window? Can you give reasons?


ADD REPLYlink written 5.6 years ago by Amit Lavon10
gravatar for jordi
3.0 years ago by
Johns Hopkins University
jordi0 wrote:

Look at the math in informME:

Jenkinson, G., Pujadas, E., Goutsias, J., & Feinberg, A. P. (2017). Potential energy landscapes identify the information-theoretic nature of the epigenome. Nat Genet, 49(5), 719–729. Retrieved from

All the other tools do not account for correlation, or the closer they get is using some sort of smoothing technique. By assuming independence, they are not capable to control the false positive rate. In addition, differences in methylation do not necessarily have to be related to differences in mean. It could be the case that the probability distributions for a given region of the methylation state (binary vector of certain length) have the same mean but completely different shapes (a bimodal and a unimodal distributions can have same mean).

ADD COMMENTlink written 3.0 years ago by jordi0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 817 users visited in the last hour