Question: How to facilitate the performance of overlap for GRanges objects (verifiable example is given)?
0
4.4 years ago by
Chicago, IL, USA
Jurat Shahidin80 wrote:

Hi every one:

I have GRanges objects to find overlap simultaneously, I did implement function to accomplish this task, it works good so far. Now I need to figure out better way to facilitate the process, and needs to come up more compatible function to improve the re-usability of my code. However, I tried my attempt to get better function for it, but I don't happy with my current function performance. Therefore, I came for this community to get some ideas to achieve my desired output.

Here is the reproducible example to understand the problem that I faced:

# data:

``````gr1 <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(5, 6, 4, 3)),
ranges=IRanges(seq(1, by=9, len=18), seq(6, by=9, len=18)),
rangeName=letters[seq(1:18)], score=sample(1:25, 18, replace = FALSE))

gr2<- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(4, 7, 5, 4)),
ranges=IRanges(seq(2, by=11, len=20), seq(8, by=11, len=20)),
rangeName=letters[seq(1:20)], score=sample(1:25, 20, replace = FALSE))
gr3 <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(8, 7, 6, 4)),
ranges=IRanges(seq(4, by=11, len=25), seq(9, by=11, len=25)),
rangeName=letters[seq(1:25)], score=sample(1:25, 25, replace = FALSE))
``````

# step 1:

``````library(magrittr)
gr1.ss <- gr1 %>% subset(score > 12)
gr1.ww <- gr1 %>% subset(score >5 & score <=12)
gr1.nn <- gr1 %>% subset(score <= 5)

targets <- list(gr2, gr3)
``````

# step 2 :

``````ov.ss <- lapply(targets, function(ele_) {
res <- as(findOverlaps(gr1.ss, ele_), "List")
})
ov.ww <- lapply(targets, function(ele_) {
res <- as(findOverlaps(gr1.ww, ele_), "List")
})
``````

this is my attempt to facilitate ov.ss, ov.ww can happen one wrapper function, so I can loop through gr1.res by finding match of its name, and internally call ov .

# step 3 - my attempt (need to be improved) :

``````gr1.res <- list('ss'=gr1.ss, 'ww'=gr1.ww, 'noise'=gr1.nn)
for(i in 1:length((gr1.res))) {
query <- S4Vectors::getListElement(gr1.res, i)
if(names(query)=='nn') {
message("do not process, and directly export as data.frame")
out <- unique(query) %>% as.data.frame
}
ov <- lapply(targets, function(ele_) {
res <- as(findOverlaps(query, ele_), "List")
})
ov
}
``````

I don't want to process `gr1.nn` and just directly export them as data.frame objects. I want to integrate doing ov.ss and ov.ww in one wrapper functions, so I want to come up new way to do this. How can I facilitate my attempt solution? How can I have more compatible function for facilitating my desired attempt. I will be grateful if any one can give me any enlightening solution, possible suggestion or ideas to achieve my desired functions. Thanks a lot

Best regards:

Jurat

modified 4.4 years ago • written 4.4 years ago by Jurat Shahidin80
3
4.4 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

``````> intersect(intersect(gr1, gr2), gr3)
GRanges object with 6 ranges and 0 metadata columns:
seqnames    ranges strand
<Rle> <IRanges>  <Rle>
[1]     chr1  [ 4,  6]      *
[2]     chr1  [15, 15]      *
[3]     chr1  [19, 19]      *
[4]     chr1  [28, 30]      *
[5]     chr1  [37, 41]      *
[6]     chr2  [92, 96]      *
-------
seqinfo: 4 sequences from an unspecified genome; no seqlengths
``````

Just to make sure that the order of the intersections does not matter:

``````> intersect(intersect(gr3, gr2), gr1)
GRanges object with 6 ranges and 0 metadata columns:
seqnames    ranges strand
<Rle> <IRanges>  <Rle>
[1]     chr1  [ 4,  6]      *
[2]     chr1  [15, 15]      *
[3]     chr1  [19, 19]      *
[4]     chr1  [28, 30]      *
[5]     chr1  [37, 41]      *
[6]     chr2  [92, 96]      *
-------
seqinfo: 4 sequences from an unspecified genome; no seqlengths
``````