Question: Integrating DMR data
0
gravatar for Matea Mikecin
3.3 years ago by
Germany/Heidelberg/DKFZ
Matea Mikecin10 wrote:

Hi,

I would like to find overlapping sequences from differentially methylated regions from three different samples. Is there an easy way to do it for a non-bioinformatician?

Thank you in advance,

Matea.

alignment • 1.1k views
ADD COMMENTlink modified 15 months ago by Antonio R. Franco4.5k • written 3.3 years ago by Matea Mikecin10

In what format are your DMRs now, is it "chr-start-end" for all three samples or something more raw?

Also how non-bioinformatician are you? If you got a basic R-script to do this, could you modify it for your needs?

Are you actually interested in the sequences themself or just where they are and between what samples the DMRs overlap?

ADD REPLYlink written 3.3 years ago by Mattias Aine570

Hi Mattias Aine,

unfortunately I am very non-bioinformatician. I have my samples in "chr-start-end" format. I need the sequences to design the primers.

ADD REPLYlink written 3.3 years ago by Matea Mikecin10

Ok but what you want to know is all the places in the genome where S1+S2, S1+S3, S2+S3 or S1-3 have overlapping DMRs so you can go on and design primers?

This is quite easy to do in R but would require that you can get to a point where you can get your data into R, install one bioconductor package and run a sequence of commands. Up for it?

ADD REPLYlink written 3.3 years ago by Mattias Aine570

I am always in for adventures. I'll install R. Do you have any hints how to begin?

ADD REPLYlink written 3.3 years ago by Matea Mikecin10
4
gravatar for Mattias Aine
3.3 years ago by
Mattias Aine570
Sweden
Mattias Aine570 wrote:

Assuming you have the coordinates for DMRs in each of the samples and can get them into R it's fairly simple using GenomicRanges.

Here is a simple example of how you could try to get it done starting with e.g. BED-like coordinates for your peaks. Might have some bugs but if you're lucky it works straight away by copy-pasteing into R but can hopefully get you started..

In general you get data into R by providing the path to your file on interest and using one of the built-in functions like 'read.table'.

data<-read.table(file="c:/path/to/my/data/mydata.txt",sep="\t")
head(data) #for peak at top 6 rows

Load/install required package (in R)

if('GenomicRanges' %in% rownames(installed.packages() ) ) { library(GenomicRanges) } else { source("https://bioconductor.org/biocLite.R") ; biocLite("GenomicRanges") ; library(GenomicRanges) }

Create mock DMRs (substitute with own coordinates by reading in your file so it looks something like this)

DMRs1<-data.frame(chr=c("chr1","chr1","chr2","chr2"),start=c(100,500,1000,2000),end=c(100,500,1000,2000)+100,stringsAsFactors=F)
DMRs2<-data.frame(chr=c("chr1","chr1","chr2","chr2"),start=c(300,525,1000,1800),end=c(300,525,1000,1800)+100,stringsAsFactors=F)
DMRs3<-data.frame(chr=c("chr1","chr1","chr2","chr2"),start=c(350,500,1000,2200),end=c(350,500,1000,2200)+100,stringsAsFactors=F)
head(DMRs1) ##check object, your files should look something like this once read in..

Make GRanges objects

r1<-GRanges(seqnames=DMRs1[,"chr"],ranges=IRanges(start=DMRs1[,"start"],end=DMRs1[,"end"]))
r2<-GRanges(seqnames=DMRs2[,"chr"],ranges=IRanges(start=DMRs2[,"start"],end=DMRs2[,"end"]))
r3<-GRanges(seqnames=DMRs3[,"chr"],ranges=IRanges(start=DMRs3[,"start"],end=DMRs3[,"end"]))

Get DMR overlaps using coverage and convert back to GRanges

myRanges<-coverage(c(r1,r2,r3))
myRanges<-as(myRanges,"GRanges")

Only keep regions where at least 2 DMRs have some degree of overlap

myRanges<-myRanges[score(myRanges)>1] # >2 would require all three experiments to have overlapping DMR

Merge contiguous regions with variable number of overlapping peaks

#myRanges<-myRanges[width(myRanges)>50] #only keep overlaps larger than e.g. 50 bp, remove first '#' to use
myRanges<-reduce(myRanges) #merge contiguous regions

Make container for map-back

elementMetadata(myRanges)$consensusRegions2dmr1Row<-NA
elementMetadata(myRanges)$consensusRegions2dmr2Row<-NA
elementMetadata(myRanges)$consensusRegions2dmr3Row<-NA

Map back to original peaks

myMatches<-as.matrix(findOverlaps(myRanges,r1))
elementMetadata(myRanges)$consensusRegions2dmr1Row[myMatches[,1]]<-myMatches[,2]

myMatches<-as.matrix(findOverlaps(myRanges,r2))
elementMetadata(myRanges)$consensusRegions2dmr2Row[myMatches[,1]]<-myMatches[,2]

myMatches<-as.matrix(findOverlaps(myRanges,r3))
elementMetadata(myRanges)$consensusRegions2dmr3Row[myMatches[,1]]<-myMatches[,2]

Output results matrix

out<-data.frame(chr=as.character(seqnames(myRanges)),start=start(myRanges),end=end(myRanges),as(elementMetadata(myRanges),"matrix"),stringsAsFactors=F)

write.table(out,file="peaksOverlapsOut.txt",sep="\t",quote=F,row.names=F) #open in e.g. excel and check
ADD COMMENTlink written 3.3 years ago by Mattias Aine570

Wow! Thanks! This will keep me busy for some time. Will try it and let you know how it worked. Once again, thank you very much!

ADD REPLYlink written 3.3 years ago by Matea Mikecin10

Hi Mattias Aine, thanks for sharing the code. It works perfectly until the last step, where I get the following:

Error in as.vector(data) : no method for coercing this S4 class to a vector

Any ideas on how to solve this problem? I've tried adding up the following into the NAMESPACE file, but that didn't help.

importMethodsFrom(GenomicRanges, as.data.frame)

ADD REPLYlink written 3.3 years ago by Ruzhica10

Please add the full code chunk with error messages.

ADD REPLYlink written 3.3 years ago by Mattias Aine570

Loading required package: GenomicRanges Loading required package: BiocGenerics Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unlist, unsplit

Loading required package: S4Vectors Loading required package: stats4 Loading required package: IRanges Loading required package: GenomeInfoDb

data<-read.table(file="c:/path/to/my/data/mydata.bed",sep="\t")

Warning message: package ‘BiocGenerics’ was built under R version 3.2.2

data1<-read.table(file="c:/path/to/my/data/mydata.bed",sep="\t")
head(data)
DMRs1<-data.frame(chr=data[,1],start=data[,2],end=data[,3],stringsAsFactors=F)
DMRs2<-data.frame(chr=data1[,1],start=data1[,2],end=data1[,3],stringsAsFactors=F)
head(DMRs1)
r1<-GRanges(seqnames=DMRs1[,"chr"],ranges=IRanges(start=DMRs1[,"start"],end=DMRs1[,"end"]))
r2<-GRanges(seqnames=DMRs2[,"chr"],ranges=IRanges(start=DMRs2[,"start"],end=DMRs2[,"end"]))
myRanges<-coverage(c(r1,r2))
myRanges<-as(myRanges,"GRanges")
myRanges<-myRanges[score(myRanges)>1]
myRanges<-reduce(myRanges)
elementMetadata(myRanges)$consensusRegions2dmr1Row<-NA
elementMetadata(myRanges)$consensusRegions2dmr2Row<-NA
myMatches<-as.matrix(findOverlaps(myRanges,r1))
elementMetadata(myRanges)$consensusRegions2dmr1Row[myMatches[,1]]<-myMatches[,2]
myMatches<-as.matrix(findOverlaps(myRanges,r2))
elementMetadata(myRanges)$consensusRegions2dmr2Row[myMatches[,1]]<-myMatches[,2]
out<-data.frame(chr=as.character(seqnames(myRanges)),start=start(myRanges),end=end(myRanges),as(elementMetadata(myRanges),"matrix"),stringsAsFactors=F)

Error in as.vector(data) : no method for coercing this S4 class to a vector

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Ruzhica10
1

If you just write the name of the results object 'myRanges' what does that look like?

myRanges #type this in R and press enter

Anything to write to file there?

ADD REPLYlink written 3.3 years ago by Mattias Aine570

This is how it looks like

GRanges object with 422 ranges and 2 metadata columns:
        seqnames                 ranges strand   | consensusRegions2dmr1Row
           <Rle>              <IRanges>  <Rle>   |                <integer>
    [1]        1     [ 949285,  949737]      *   |                        1
    [2]        1     [ 976705,  976708]      *   |                        2
    [3]        1     [1244840, 1245077]      *   |                        7
    [4]        1     [1365792, 1366320]      *   |                       10
    [5]        1     [2144696, 2145231]      *   |                       20
    ...      ...                    ...    ... ...                      ...
  [418]        9 [140175619, 140175972]      *   |                     1819
  [419]        9 [140199579, 140199944]      *   |                     1821
  [420]        9 [140311775, 140312125]      *   |                     1824
  [421]        X [  9984266,   9984622]      *   |                     3858
  [422]        X [ 49090608,  49090773]      *   |                     3872
        consensusRegions2dmr2Row
                       <integer>
    [1]                        1
    [2]                        2
    [3]                        7
    [4]                        9
    [5]                       12
    ...                      ...
  [418]                      942
  [419]                      943
  [420]                      945
  [421]                     2023
  [422]                     2034
  -------
  seqinfo: 24 sequences from an unspecified genome
ADD REPLYlink written 3.3 years ago by Ruzhica10
1

ok, think it should work but then which one of these fails?

 a<-as.vector(seqnames(myRanges))
 b<-start(myRanges)
 c<-end(myRanges)
 d<-as(elementMetadata(myRanges),"matrix")

If none then check that they are all same length or dimensions

 length(a)
 dim(d) #number of rows should be same as length(a)

 out<-cbind(chr=a,start=b,end=c,d) #then print to file..
ADD REPLYlink written 3.3 years ago by Mattias Aine570
 > d<-as(elementMetadata(myRanges),"matrix")

d fails, I'm getting this error message: Error in as.vector(data) : no method for coercing this S4 class to a vector

ADD REPLYlink written 3.3 years ago by Ruzhica10
2

Strange, works fine for me..

How bout this then?

 d<-do.call("cbind",lapply(elementMetadata(myRanges),as.vector))
ADD REPLYlink written 3.3 years ago by Mattias Aine570

this one works quite OK

ADD REPLYlink written 3.3 years ago by Ruzhica10
1

Are we done then or is there anything else?

ADD REPLYlink written 3.3 years ago by Mattias Aine570
1

Yes we are done. All works great! Thanks for all the suggestions and for putting all the effort into this.

ADD REPLYlink written 3.3 years ago by Ruzhica10

Hard to know if you are the same person as the original poster. If by chance you are (and you used a new account for this) you should log in to the other account and "accept" this solution (green check mark) to provide closure to this thread.

ADD REPLYlink written 3.3 years ago by genomax85k

That's all what comes out when running your code. Thanks really much for taking the time!

ADD REPLYlink written 3.3 years ago by Ruzhica10
1

Hi Mattias Aine, I asked my colleague Ruzhica for help with R 'cause it was a bit too much for me after all. Thank you very much for help!!!

ADD REPLYlink written 3.3 years ago by Matea Mikecin10
2
gravatar for xieshaojun0621
3.3 years ago by
United States
xieshaojun0621170 wrote:

If you know how to use Linux, you can install BEDTools and use intersectBed to accomplish your task.

ADD COMMENTlink written 3.3 years ago by xieshaojun0621170
1
gravatar for Antonio R. Franco
15 months ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.5k wrote:

If you provide a gft file (or a gff file converted to a gtf), and a BED file, a nice option is to use rgmatch.py

This program only needs a simple command lane documented in the web page, and gives you the position of the DMR (or any other feature) in the many places you choose, such as Upstream, Promoter, TSS, 1st_exon, introns, Gene_body, TTS and Downstream. The lenght of these features can be selected by the final user Places where rgmatch intersect your features

ADD COMMENTlink written 15 months ago by Antonio R. Franco4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 851 users visited in the last hour