Dear BioStar Users,

I am using minfi to analyze the results of an EPIC array. To check differential methylation I decided to look for differentially methylated regions (DMRs) using bumphunter as recommended.

`bumphunter`

function returns candidate DMRs with several columns of information including `value`

and `area`

(clarification in this biostars post).

For 2 groups, `value`

column is the average of the coefficients of the linear model calculated for each site within the DMR while `area`

column is the sum of all coefficients. This can be easily checked by calculating the difference, between the 2 groups, of the mean value (M or beta) for each methylation site of the DMR and then averaging (`value`

column) or summing (`area`

column) these differences.

Thus, a positive `value`

indicates more methylation in the non-control group, and the opposite for a negative `value`

. Until here, everything is clear and makes sense.

My problem is that I have 3 groups, say control, drug A and drug B for example. But, `bumphunter`

still returns a unique `value`

and `area`

. How do you interpret the `value`

and `area`

of each candidate DMR? What do they mean in terms methylation levels in each group? How would you filter them to get biological information?

To answer these questions I went to check the function code but got more confused.

`bumphunter`

function can be accessed by typing `getMethod("bumphunter", "matrix")`

and you get:

```
function (object, ...)
{
.local <- function (object, design, chr = NULL, pos, cluster = NULL,
coef = 2, cutoff = NULL, pickCutoff = FALSE, pickCutoffQ = 0.99,
maxGap = 500, nullMethod = c("permutation", "bootstrap"),
smooth = FALSE, smoothFunction = locfitByCluster, useWeights = FALSE,
B = ncol(permutations), permutations = NULL, verbose = TRUE,
...)
{
nullMethod <- match.arg(nullMethod)
if (missing(design))
stop("design must be specified")
if (missing(pos))
stop("If object is a matrix, pos must be specified")
bumphunterEngine(object, design = design, chr = chr,
pos, cluster = cluster, coef = coef, cutoff = cutoff,
pickCutoff = pickCutoff, pickCutoffQ = pickCutoffQ,
maxGap = maxGap, nullMethod = nullMethod, smooth = smooth,
smoothFunction = smoothFunction, useWeights = useWeights,
B = B, permutations = NULL, verbose = verbose, ...)
}
.local(object, ...)
}
```

So we find that it is, basically, a S4 method to call the `bumphunterEngine`

function that does most of the job.

So, typing `bumphunterEngine`

you get:

```
function (mat, design, chr = NULL, pos, cluster = NULL, coef = 2,
cutoff = NULL, pickCutoff = FALSE, pickCutoffQ = 0.99, maxGap = 500,
nullMethod = c("permutation", "bootstrap"), smooth = FALSE,
smoothFunction = locfitByCluster, useWeights = FALSE, B = ncol(permutations),
permutations = NULL, verbose = TRUE, ...)
{
...
if (useWeights & smooth) {
tmp <- .getEstimate(mat = mat, design = design, coef = coef,
full = TRUE)
rawBeta <- tmp$coef
weights <- 1/tmp$sigma
rm(tmp)
}
else {
rawBeta <- .getEstimate(mat = mat, design = design, coef = coef,
full = FALSE)
weights <- NULL
}
...
else {
beta <- rawBeta
Index <- seq(along = beta)
}
...
comp <- computation.tots2(tab = tab, A = A)
rate2 <- comp$rate2
pvalues2 <- comp$pvalues2
tab$p.value <- pvalues1
tab$fwer <- rate1
tab$p.valueArea <- pvalues2
tab$fwerArea <- rate2
tab <- tab[order(tab$fwer, -tab$area), ]
algorithm <- list(version = packageDescription("bumphunter")$Version,
coef = coef, cutoff = cutoff, pickCutoff = pickCutoff,
pickCutoffQ = pickCutoffQ, smooth = smooth, maxGap = maxGap,
nullMethod = nullMethod, B = B, permutations = permutations,
useWeights = useWeights, smoothFunction = deparse(substitute(smoothFunction)))
ret <- list(table = tab, coef = rawBeta, fitted = beta, pvaluesMarginal = pvs,
null = list(value = V, length = L), algorithm = algorithm)
class(ret) <- "bumps"
return(ret)
}
```

It is a long function, but the relevant parts for this problem IMO are `coef = 2`

, in the parameters of the function, and `.getEstimate(mat = mat, design = design, coef = coef, ...)`

which calculates the coefficients.

Regarding `coef = 2`

, we can see that the original `bumphunter`

function already has it defined as defaults. Also, knowing that bumhunter was originally defined to compare 2 groups I went to check the minfi wrapper to see if they implemented a change.

```
setMethod(
"bumphunter",
signature(object = "GenomicRatioSet"),
function(object, design, cluster = NULL, coef = 2, cutoff = NULL,
pickCutoff = FALSE, pickCutoffQ = 0.99, maxGap = 500,
nullMethod = c("permutation", "bootstrap"), smooth = FALSE,
smoothFunction = locfitByCluster, useWeights = FALSE,
B = ncol(permutations), permutations = NULL, verbose = TRUE,
type = c("Beta","M"), ...) {
.isMatrixBackedOrStop(object, "bumphunter")
type <- match.arg(type)
bumphunterEngine(
mat = getMethSignal(object, type),
design = design,
chr = as.factor(seqnames(object)),
pos = start(object),
cluster = cluster,
coef = coef,
cutoff = cutoff,
pickCutoff = pickCutoff,
pickCutoffQ = pickCutoffQ,
maxGap = maxGap,
nullMethod = nullMethod,
smooth = smooth,
smoothFunction = smoothFunction,
useWeights = useWeights,
B = B,
permutations = permutations,
verbose = verbose,
...)
}
)
```

We see that the minfi wrapper basically fits the right matrix of data (`mat = getMethSignal(object, type)`

) to dhe `bumphunterEngine`

function keeping the `coef = 2`

option.

At this point, I went to check the other function `.getEstimate(mat = mat, design = design, coef = coef, ...)`

which calculates the coefficients, if I understood properly, based on the data matrix (`mat = mat`

), the experimental design (`design = design`

) and the coefficient (`coef = coef`

, which gets set to 2). The code can be found here.

```
.getEstimate <- function(mat, design, coef, B=NULL, permutations=NULL,
full=FALSE) {
v <- design[,coef]
A <- design[,-coef, drop=FALSE]
qa <- qr(A)
S <- diag(nrow(A)) - tcrossprod(qr.Q(qa))
vv <- if (is.null(B)) matrix(v, ncol=1) else{
if(is.null(permutations)){
replicate(B, sample(v))
} else{
apply(permutations,2,function(i) v[i])
}
}
sv <- S %*% vv
vsv <- diag(crossprod(vv,sv))
b <- (mat %*% crossprod(S, vv)) / vsv
if(!is.matrix(b))
b <- matrix(b, ncol = 1)
if (full) {
sy <- mat %*% S
df.residual <- ncol(mat) - qa$rank - 1
if (is.null(B)) {
sigma <- matrix(sqrt(rowSums((sy - tcrossprod(b, sv))^2) / df.residual), ncol=1)
} else {
sigma <- b
tmp <- sy
for (j in 1:B) {
tmp <- tcrossprod(b[,j], sv[,j])
sigma[,j] <- rowSums((sy-tmp)^2)
}
sigma <- sqrt(sigma/df.residual)
}
out <- list(coef=b,
sigma=sigma,
stdev.unscaled=sqrt(1/vsv),
df.residual=df.residual)
if (is.null(B)) out$stdev <- as.numeric(out$stdev)
} else {
out <- b
}
return(out)
}
```

and my design would be something like:

```
des.groups <- c(rep("control", 2), rep("drug.A", 2), rep('drug.B', 2))
design <- model.matrix(~ des.groups)
design
(Intercept) des.groupsdrug.A des.groupsdrug.B
1 1 0 0
2 1 0 0
3 1 1 0
4 1 1 0
5 1 0 1
6 1 0 1
```

I am worried that, just at the beggining of the `.getEstimate`

function, it subsets the design based on the `coef`

argument

```
v <- design[,coef]
```

getting

```
> v
1 2 3 4 5 6
0 0 1 1 0 0
```

However this is as far as I can go with my linear algebra knowledge. So I am wondering if the `value`

and `area`

returned by `bumphunter`

reflect the mehylation changes of "just" 2 groups (in my case control (Intercept) and des.groupsdrug.A).

What do you think? If this is the case, how would you address the comparison among 3 groups? with 3 pairwise comparisons?

If you arrived here, THANKS for reading the long post and please share your thoughts.