Question: How Bimodal Is A Methylation Distribution?
Analysing methylation from a sequencing experiment, we seem to be having some bimodal distributed values.

The question I was posed (which I'm still not so sure whether makes sense or not) is "Are you sure this is bimodal?". Needless to say, the plot doesn't show a clear "yes" or "no" answer. I.e. it's neither two well defined peaks nor one single peak, but something in between. So the questions are:

1- Is there a way to assign a p-value (or any metric) on how bimodal a distribution is?

2- Does it even make sense to ask this question? (Why or why not?).

Apologies in advanced if the question is too off topic.

methylation statistics • 3.3k views
1) The advice given in this CrossValidated thread seems solid. Really, if you've got a hairy stats question, those are the guys to ask.

2) Bimodal distribution of methylation scores is is common, from what I understand. Here's a figure showing a distribution of methylation scores in CpG islands, taken from the supplement of this paper:

Harris, et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

Nature Link:

PMC Link:

from supp

A google search on testing for modality picks up this bit of advice

From work I am involved in on CpG islands I guess that your trends should be bimodal, it can be seen clearly in CpG island methylation in human and in full transcript methylation in Ciona.

There will be a Deaton et al paper coming out in Genome Research later in the year that will also show that the 'CpG shores' hypothesis is less likely.

The reference you gave is about someone who comments on a method based on a package that he never used or even looked at.

Sorry, but it does seem a little bit too vague (plus it doesn't answer my question).

Also, minus one for referencing Google.

Very preliminary comment before heading home,

looks like a mixture distributions problem. If you data looks normal, then an EM algorithm and likelihood ratio test could solve this.

I am sure this is not quite so easy.


Actually, I get similar distribution from HumanMethylation450 data.

