Integrating DMR data
3
0
Entering edit mode
7.1 years ago

Hi,

I would like to find overlapping sequences from differentially methylated regions from three different samples. Is there an easy way to do it for a non-bioinformatician?

Thank you in advance,

Matea.

alignment • 2.9k views
ADD COMMENT
0
Entering edit mode

In what format are your DMRs now, is it "chr-start-end" for all three samples or something more raw?

Also how non-bioinformatician are you? If you got a basic R-script to do this, could you modify it for your needs?

Are you actually interested in the sequences themself or just where they are and between what samples the DMRs overlap?

ADD REPLY
0
Entering edit mode

Hi Mattias Aine,

unfortunately I am very non-bioinformatician. I have my samples in "chr-start-end" format. I need the sequences to design the primers.

ADD REPLY
0
Entering edit mode

Ok but what you want to know is all the places in the genome where S1+S2, S1+S3, S2+S3 or S1-3 have overlapping DMRs so you can go on and design primers?

This is quite easy to do in R but would require that you can get to a point where you can get your data into R, install one bioconductor package and run a sequence of commands. Up for it?

ADD REPLY
0
Entering edit mode

I am always in for adventures. I'll install R. Do you have any hints how to begin?

ADD REPLY
4
Entering edit mode
7.1 years ago
Mattias Aine ▴ 620

Assuming you have the coordinates for DMRs in each of the samples and can get them into R it's fairly simple using GenomicRanges.

Here is a simple example of how you could try to get it done starting with e.g. BED-like coordinates for your peaks. Might have some bugs but if you're lucky it works straight away by copy-pasteing into R but can hopefully get you started..

In general you get data into R by providing the path to your file on interest and using one of the built-in functions like 'read.table'.

data<-read.table(file="c:/path/to/my/data/mydata.txt",sep="\t")
head(data) #for peak at top 6 rows

Load/install required package (in R)

if('GenomicRanges' %in% rownames(installed.packages() ) ) { library(GenomicRanges) } else { source("https://bioconductor.org/biocLite.R") ; biocLite("GenomicRanges") ; library(GenomicRanges) }

Create mock DMRs (substitute with own coordinates by reading in your file so it looks something like this)

DMRs1<-data.frame(chr=c("chr1","chr1","chr2","chr2"),start=c(100,500,1000,2000),end=c(100,500,1000,2000)+100,stringsAsFactors=F)
DMRs2<-data.frame(chr=c("chr1","chr1","chr2","chr2"),start=c(300,525,1000,1800),end=c(300,525,1000,1800)+100,stringsAsFactors=F)
DMRs3<-data.frame(chr=c("chr1","chr1","chr2","chr2"),start=c(350,500,1000,2200),end=c(350,500,1000,2200)+100,stringsAsFactors=F)
head(DMRs1) ##check object, your files should look something like this once read in..

Make GRanges objects

r1<-GRanges(seqnames=DMRs1[,"chr"],ranges=IRanges(start=DMRs1[,"start"],end=DMRs1[,"end"]))
r2<-GRanges(seqnames=DMRs2[,"chr"],ranges=IRanges(start=DMRs2[,"start"],end=DMRs2[,"end"]))
r3<-GRanges(seqnames=DMRs3[,"chr"],ranges=IRanges(start=DMRs3[,"start"],end=DMRs3[,"end"]))

Get DMR overlaps using coverage and convert back to GRanges

myRanges<-coverage(c(r1,r2,r3))
myRanges<-as(myRanges,"GRanges")

Only keep regions where at least 2 DMRs have some degree of overlap

myRanges<-myRanges[score(myRanges)>1] # >2 would require all three experiments to have overlapping DMR

Merge contiguous regions with variable number of overlapping peaks

#myRanges<-myRanges[width(myRanges)>50] #only keep overlaps larger than e.g. 50 bp, remove first '#' to use
myRanges<-reduce(myRanges) #merge contiguous regions

Make container for map-back

elementMetadata(myRanges)$consensusRegions2dmr1Row<-NA
elementMetadata(myRanges)$consensusRegions2dmr2Row<-NA
elementMetadata(myRanges)$consensusRegions2dmr3Row<-NA

Map back to original peaks

myMatches<-as.matrix(findOverlaps(myRanges,r1))
elementMetadata(myRanges)$consensusRegions2dmr1Row[myMatches[,1]]<-myMatches[,2]

myMatches<-as.matrix(findOverlaps(myRanges,r2))
elementMetadata(myRanges)$consensusRegions2dmr2Row[myMatches[,1]]<-myMatches[,2]

myMatches<-as.matrix(findOverlaps(myRanges,r3))
elementMetadata(myRanges)$consensusRegions2dmr3Row[myMatches[,1]]<-myMatches[,2]

Output results matrix

out<-data.frame(chr=as.character(seqnames(myRanges)),start=start(myRanges),end=end(myRanges),as(elementMetadata(myRanges),"matrix"),stringsAsFactors=F)

write.table(out,file="peaksOverlapsOut.txt",sep="\t",quote=F,row.names=F) #open in e.g. excel and check
ADD COMMENT
0
Entering edit mode

Wow! Thanks! This will keep me busy for some time. Will try it and let you know how it worked. Once again, thank you very much!

ADD REPLY
0
Entering edit mode

Hi Mattias Aine, thanks for sharing the code. It works perfectly until the last step, where I get the following:

Error in as.vector(data) : no method for coercing this S4 class to a vector

Any ideas on how to solve this problem? I've tried adding up the following into the NAMESPACE file, but that didn't help.

importMethodsFrom(GenomicRanges, as.data.frame)

ADD REPLY
0
Entering edit mode

Please add the full code chunk with error messages.

ADD REPLY
0
Entering edit mode

Loading required package: GenomicRanges Loading required package: BiocGenerics Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unlist, unsplit

Loading required package: S4Vectors Loading required package: stats4 Loading required package: IRanges Loading required package: GenomeInfoDb

data<-read.table(file="c:/path/to/my/data/mydata.bed",sep="\t")

Warning message: package ‘BiocGenerics’ was built under R version 3.2.2

data1<-read.table(file="c:/path/to/my/data/mydata.bed",sep="\t")
head(data)
DMRs1<-data.frame(chr=data[,1],start=data[,2],end=data[,3],stringsAsFactors=F)
DMRs2<-data.frame(chr=data1[,1],start=data1[,2],end=data1[,3],stringsAsFactors=F)
head(DMRs1)
r1<-GRanges(seqnames=DMRs1[,"chr"],ranges=IRanges(start=DMRs1[,"start"],end=DMRs1[,"end"]))
r2<-GRanges(seqnames=DMRs2[,"chr"],ranges=IRanges(start=DMRs2[,"start"],end=DMRs2[,"end"]))
myRanges<-coverage(c(r1,r2))
myRanges<-as(myRanges,"GRanges")
myRanges<-myRanges[score(myRanges)>1]
myRanges<-reduce(myRanges)
elementMetadata(myRanges)$consensusRegions2dmr1Row<-NA
elementMetadata(myRanges)$consensusRegions2dmr2Row<-NA
myMatches<-as.matrix(findOverlaps(myRanges,r1))
elementMetadata(myRanges)$consensusRegions2dmr1Row[myMatches[,1]]<-myMatches[,2]
myMatches<-as.matrix(findOverlaps(myRanges,r2))
elementMetadata(myRanges)$consensusRegions2dmr2Row[myMatches[,1]]<-myMatches[,2]
out<-data.frame(chr=as.character(seqnames(myRanges)),start=start(myRanges),end=end(myRanges),as(elementMetadata(myRanges),"matrix"),stringsAsFactors=F)

Error in as.vector(data) : no method for coercing this S4 class to a vector

ADD REPLY
1
Entering edit mode

If you just write the name of the results object 'myRanges' what does that look like?

myRanges #type this in R and press enter

Anything to write to file there?

ADD REPLY
0
Entering edit mode

This is how it looks like

GRanges object with 422 ranges and 2 metadata columns:
        seqnames                 ranges strand   | consensusRegions2dmr1Row
           <Rle>              <IRanges>  <Rle>   |                <integer>
    [1]        1     [ 949285,  949737]      *   |                        1
    [2]        1     [ 976705,  976708]      *   |                        2
    [3]        1     [1244840, 1245077]      *   |                        7
    [4]        1     [1365792, 1366320]      *   |                       10
    [5]        1     [2144696, 2145231]      *   |                       20
    ...      ...                    ...    ... ...                      ...
  [418]        9 [140175619, 140175972]      *   |                     1819
  [419]        9 [140199579, 140199944]      *   |                     1821
  [420]        9 [140311775, 140312125]      *   |                     1824
  [421]        X [  9984266,   9984622]      *   |                     3858
  [422]        X [ 49090608,  49090773]      *   |                     3872
        consensusRegions2dmr2Row
                       <integer>
    [1]                        1
    [2]                        2
    [3]                        7
    [4]                        9
    [5]                       12
    ...                      ...
  [418]                      942
  [419]                      943
  [420]                      945
  [421]                     2023
  [422]                     2034
  -------
  seqinfo: 24 sequences from an unspecified genome
ADD REPLY
1
Entering edit mode

ok, think it should work but then which one of these fails?

 a<-as.vector(seqnames(myRanges))
 b<-start(myRanges)
 c<-end(myRanges)
 d<-as(elementMetadata(myRanges),"matrix")

If none then check that they are all same length or dimensions

 length(a)
 dim(d) #number of rows should be same as length(a)

 out<-cbind(chr=a,start=b,end=c,d) #then print to file..
ADD REPLY
0
Entering edit mode
 > d<-as(elementMetadata(myRanges),"matrix")

d fails, I'm getting this error message: Error in as.vector(data) : no method for coercing this S4 class to a vector

ADD REPLY
2
Entering edit mode

Strange, works fine for me..

How bout this then?

 d<-do.call("cbind",lapply(elementMetadata(myRanges),as.vector))
ADD REPLY
0
Entering edit mode

this one works quite OK

ADD REPLY
1
Entering edit mode

Are we done then or is there anything else?

ADD REPLY
1
Entering edit mode

Yes we are done. All works great! Thanks for all the suggestions and for putting all the effort into this.

ADD REPLY
0
Entering edit mode

Hard to know if you are the same person as the original poster. If by chance you are (and you used a new account for this) you should log in to the other account and "accept" this solution (green check mark) to provide closure to this thread.

ADD REPLY
0
Entering edit mode

That's all what comes out when running your code. Thanks really much for taking the time!

ADD REPLY
1
Entering edit mode

Hi Mattias Aine, I asked my colleague Ruzhica for help with R 'cause it was a bit too much for me after all. Thank you very much for help!!!

ADD REPLY
2
Entering edit mode
7.1 years ago

If you know how to use Linux, you can install BEDTools and use intersectBed to accomplish your task.

ADD COMMENT
1
Entering edit mode
5.1 years ago

If you provide a gft file (or a gff file converted to a gtf), and a BED file, a nice option is to use rgmatch.py

This program only needs a simple command lane documented in the web page, and gives you the position of the DMR (or any other feature) in the many places you choose, such as Upstream, Promoter, TSS, 1st_exon, introns, Gene_body, TTS and Downstream. The lenght of these features can be selected by the final user Places where rgmatch intersect your features

ADD COMMENT

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6