7.7 years ago by
This is a good question and one, I think, with no one correct answer.
You're correct that the default option for the IMA function indexregionfunc is median; it also offers two other options:
For each speciﬁc region of a gene, IMA will collect the loci within it
and derive an index of overall region methylation value. Currently,
there are three different index metrics implemented in IMA: mean,
median, and Tukey’s Biweight robust average. By default, the mean beta
values will be used as the region’s methylation index for further
The problem that we want to address is: how best to summarize the measurements from methylation array probes that are associated with a transcript into a single value, indicative of a DMR. This is a rather different problem to other kinds of array. For example, to summarize exon expression probesets to a transcript, we might take the median RMA value of core probesets. With the methylation array we have probes located in different types of region (CpG island, shore, shelf, in-gene) and genomic annotation is frequently less well-known.
To be frank, I suspect that many papers use mean of CpG-associated probes because the authors are biologists who have not given much thought to the statistics. Mean is certainly a way to summarize multiple probes, but is it meaningful? Bear in mind that beta-values are strongly bimodal in distribution whereas the mean describes one feature of a normal distribution. Likewise, some people probably use median because of a vague notion that it is "better than the mean" - but again, only in the context of normal distribution.
I've also seen people choose, for transcripts with multiple methylation probes, the probe with the highest variance. Or the probe with the lowest p-value after analysing differences between 2 conditions. Or take moving averages across N bases upstream of the gene, where N is anything from 500-2000 bp. In summary: I don't think anyone yet has a good handle on how to summarize methylation probe values to DMRs and what people are doing is justifying essentially arbitrary decisions.