How to do DMR analysis with BiSeq (for RRBS methylation data)?
Entering edit mode
6.3 years ago
ahmad mousavi ▴ 800


I have 20 RRBS samples which I wanted to do do deferentially methylated region (DMR) analysis with BiSeq or other R packages, however the manual resource is not enough. I want to do DMR analysis on all chromosomes not specific regions.

Is there anybody who has experience with BiSeq:

> readBismark(files, colData)
> metadata <- list(Sequencer = "Instrument", Year = "2013")
> rowRanges <- GRanges(seqnames = "chr1",ranges = IRanges(start = c(1,2,3), end = c(1,2,3)))
> colData <- DataFrame(group = c("cancer", "control"), row.names = c("sample_1", "sample_2"))
> totalReads <- matrix(c(rep(10L, 3), rep(5L, 3)), ncol = 2)
> methReads <- matrix(c(rep(5L, 3), rep(5L, 3)), ncol = 2)
> BSraw(metadata = metadata,rowRanges = rowRanges,colData = colData,totalReads = totalReads,methReads = methReads)
sequencing DMR methylation • 4.0k views
Entering edit mode

This does not answer your BiSeq question directly, but I've been very happy using metilene for DMR calling.

Entering edit mode
5.2 years ago

My personal preference is to use methylKit or COHCAP for identifying Differentially Methylated Regions

However, I have tested BiSeq for one or two projects.

I think the short answer can be described as follows:

1) First, I think you need to save your BSraw() object.

I had a slightly different way to do this (from the Bismark coverage files), but I kind of have to give advice in terms of my own experience:

BSraw.obj = readBismark(comp.files, as.character(sample.table$sampleID))

2) At least if you follow the instructions (which should save time at later steps), you are supposed to define clusters of sites to analyze:

BSraw.obj= clusterSites(object = BSraw.obj,
                        groups = var1, perc.samples = min.percent.observed,
                        min.sites = sites.per.island, max.dist = max.cluster.dist)

You don't have to define groups at this step, but you are able to so do (for a two-variable comparison)

3) You'll probably want to define some sort of gene annotation object. I'll let you decide how you would like to go about doing that, but I'll call that GenomicRanges object refGR. You can then annotate your sites with BSraw.obj = subsetByOverlaps(BSraw.obj, refGR). There are some extra steps, but I think the most appropriate thing for me to do in this discussion format is to point out the most relevant BiSeq functions.

4) There is a smoothing function:

predictedMeth = predictMeth(object = BSraw.obj, mc.cores=threads)

5) Then, you should perform your CpG site test:

betaResults = betaRegression(formula = ~var1,
                    link = "probit",
                    object = predictedMeth,
                    type = "BR", mc.cores=threads)

I think this is the step that takes the longest time (if I remember correctly).

6) You can then define differentially methylated regions as follows:

DMRs = findDMRs(betaResults,
                max.dist = max.cluster.dist,
                diff.dir = TRUE)

I think you will want to re-annotate the regions (but I was exporting both site and region results, so I wanted to have those for both analyses).

To be honest, you can probably get the best advice from the developer (and my own experience was to prefer testing other methods before BiSeq). However, I hope this helps!

Entering edit mode

Hi Charles, I know this is an old thread, but I'm a little desperate so I'm hoping you will reply. I'm very new to coding and R and I'm attempting to analyze a couple files, but every time I perform the beta regression step, my betaResults data ends up containing a lot of rows of that just read 'NA' rather than actually showing the p values and meth levels, like the table is supposed to. I'm not sure why this is happening, and I'm wondering if you ever ran into the same issue, or if you know how to fix it. Also, another question I have is about the specific code you used in your step 5. The BiSeq vignette uses betaResults = betaRegression(formula = ~group, link = "probit", object = predictedMeth, type = "BR") for the step 5 you wrote. In yours, however, you used var1 instead of group. Do you know why you used this instead?

Entering edit mode

To be honest, I preferred both methylKit and COHCAP over BiSeq.

I developed COHCAP, but I acknowledge that setting things up might be a bit of a challenge (and my ability to support using the code may be a limited, especially on certain weeks).

So, I think methylKit may be easier to use, and I think it is a reasonably good 1st option to try for analysis.

In other words, I don't remember actually finding a result that was uniquely beneficial from BiSeq, so I would not typically recommend using it.

Entering edit mode
5.2 years ago
Gordon Smyth ★ 7.0k

If you are familiar with RNA-seq analysis, you might consider edgeR as another possibility. edgeR can be used for differential methylation analysis of RRBS data, including chromosomal level tests:

edgeR estimates biological variation between replicate samples in a very effective way, taking advantage of methods development for RNA-seq, and provides full linear model capabilities (multiple groups, batch effects, covariates etc.

It can be used for CpG island based analyses or for analyses of preset genomic regions such as promoter regions.

Biological replicates vs technical variation

Given that you have 20 samples, it is critically important that you use methods (like BiSeq or edgeR) designed to take advantage of biological replicates and to assess differential methylation relative to biological variation between replicates. Many simple methylation tools don't do that and are instead designed to compare individual samples relative to technical variation only.


Login before adding your answer.

Traffic: 2545 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6