Question

How to do DMR analysis with BiSeq (for RRBS methylation data)?

1

Entering edit mode

6.6 years ago

ahmad mousavi ▴ 800

Hi

I have 20 RRBS samples which I wanted to do do deferentially methylated region (DMR) analysis with BiSeq or other R packages, however the manual resource is not enough. I want to do DMR analysis on all chromosomes not specific regions.

Is there anybody who has experience with BiSeq:

> readBismark(files, colData)
> metadata <- list(Sequencer = "Instrument", Year = "2013")
> rowRanges <- GRanges(seqnames = "chr1",ranges = IRanges(start = c(1,2,3), end = c(1,2,3)))
> colData <- DataFrame(group = c("cancer", "control"), row.names = c("sample_1", "sample_2"))
> totalReads <- matrix(c(rep(10L, 3), rep(5L, 3)), ncol = 2)
> methReads <- matrix(c(rep(5L, 3), rep(5L, 3)), ncol = 2)
> BSraw(metadata = metadata,rowRanges = rowRanges,colData = colData,totalReads = totalReads,methReads = methReads)

sequencing DMR methylation • 4.2k views

ADD COMMENT • link updated 5.5 years ago by Charles Warden 8.2k • written 6.6 years ago by ahmad mousavi ▴ 800

0

Entering edit mode

This does not answer your BiSeq question directly, but I've been very happy using metilene for DMR calling.

ADD REPLY • link 6.6 years ago by Chris Miller 22k

score 3 · Answer 1 · 2019-02-05

My personal preference is to use methylKit or COHCAP for identifying Differentially Methylated Regions

However, I have tested BiSeq for one or two projects.

I think the short answer can be described as follows:

1) First, I think you need to save your BSraw() object.

I had a slightly different way to do this (from the Bismark coverage files), but I kind of have to give advice in terms of my own experience:

BSraw.obj = readBismark(comp.files, as.character(sample.table$sampleID))

2) At least if you follow the instructions (which should save time at later steps), you are supposed to define clusters of sites to analyze:

BSraw.obj= clusterSites(object = BSraw.obj,
                        groups = var1, perc.samples = min.percent.observed,
                        min.sites = sites.per.island, max.dist = max.cluster.dist)

You don't have to define groups at this step, but you are able to so do (for a two-variable comparison)

3) You'll probably want to define some sort of gene annotation object. I'll let you decide how you would like to go about doing that, but I'll call that GenomicRanges object refGR. You can then annotate your sites with BSraw.obj = subsetByOverlaps(BSraw.obj, refGR). There are some extra steps, but I think the most appropriate thing for me to do in this discussion format is to point out the most relevant BiSeq functions.

4) There is a smoothing function:

predictedMeth = predictMeth(object = BSraw.obj, mc.cores=threads)

5) Then, you should perform your CpG site test:

betaResults = betaRegression(formula = ~var1,
                    link = "probit",
                    object = predictedMeth,
                    type = "BR", mc.cores=threads)

I think this is the step that takes the longest time (if I remember correctly).

6) You can then define differentially methylated regions as follows:

DMRs = findDMRs(betaResults,
                max.dist = max.cluster.dist,
                diff.dir = TRUE)

I think you will want to re-annotate the regions (but I was exporting both site and region results, so I wanted to have those for both analyses).

To be honest, you can probably get the best advice from the developer (and my own experience was to prefer testing other methods before BiSeq). However, I hope this helps!

score 2 · Answer 2 · 2019-02-03

If you are familiar with RNA-seq analysis, you might consider edgeR as another possibility. edgeR can be used for differential methylation analysis of RRBS data, including chromosomal level tests:

https://f1000research.com/articles/6-2055/v2

edgeR estimates biological variation between replicate samples in a very effective way, taking advantage of methods development for RNA-seq, and provides full linear model capabilities (multiple groups, batch effects, covariates etc.

It can be used for CpG island based analyses or for analyses of preset genomic regions such as promoter regions.

Biological replicates vs technical variation

Given that you have 20 samples, it is critically important that you use methods (like BiSeq or edgeR) designed to take advantage of biological replicates and to assess differential methylation relative to biological variation between replicates. Many simple methylation tools don't do that and are instead designed to compare individual samples relative to technical variation only.