How to facilitate the performance of overlap for GRanges objects (verifiable example is given)?
1
0
Entering edit mode
5.3 years ago

Hi every one:

I have GRanges objects to find overlap simultaneously, I did implement function to accomplish this task, it works good so far. Now I need to figure out better way to facilitate the process, and needs to come up more compatible function to improve the re-usability of my code. However, I tried my attempt to get better function for it, but I don't happy with my current function performance. Therefore, I came for this community to get some ideas to achieve my desired output.

Here is the reproducible example to understand the problem that I faced:

data:

gr1 <- GRanges(
      seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(5, 6, 4, 3)),
      ranges=IRanges(seq(1, by=9, len=18), seq(6, by=9, len=18)),
      rangeName=letters[seq(1:18)], score=sample(1:25, 18, replace = FALSE))

gr2<- GRanges(
  seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(4, 7, 5, 4)),
  ranges=IRanges(seq(2, by=11, len=20), seq(8, by=11, len=20)),
  rangeName=letters[seq(1:20)], score=sample(1:25, 20, replace = FALSE))
gr3 <- GRanges(
  seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(8, 7, 6, 4)),
  ranges=IRanges(seq(4, by=11, len=25), seq(9, by=11, len=25)),
  rangeName=letters[seq(1:25)], score=sample(1:25, 25, replace = FALSE))

step 1:

library(magrittr)
gr1.ss <- gr1 %>% subset(score > 12) 
gr1.ww <- gr1 %>% subset(score >5 & score <=12)
gr1.nn <- gr1 %>% subset(score <= 5)

 targets <- list(gr2, gr3)

step 2 :

ov.ss <- lapply(targets, function(ele_) {
  res <- as(findOverlaps(gr1.ss, ele_), "List")
})
ov.ww <- lapply(targets, function(ele_) {
  res <- as(findOverlaps(gr1.ww, ele_), "List")
})

this is my attempt to facilitate ov.ss, ov.ww can happen one wrapper function, so I can loop through gr1.res by finding match of its name, and internally call ov .

step 3 - my attempt (need to be improved) :

gr1.res <- list('ss'=gr1.ss, 'ww'=gr1.ww, 'noise'=gr1.nn)
for(i in 1:length((gr1.res))) {
  query <- S4Vectors::getListElement(gr1.res, i)
  if(names(query)=='nn') {
     message("do not process, and directly export as data.frame")
     out <- unique(query) %>% as.data.frame
  }
  ov <- lapply(targets, function(ele_) {
      res <- as(findOverlaps(query, ele_), "List")
    })
  ov
}

I don't want to process gr1.nn and just directly export them as data.frame objects. I want to integrate doing ov.ss and ov.ww in one wrapper functions, so I want to come up new way to do this. How can I facilitate my attempt solution? How can I have more compatible function for facilitating my desired attempt. I will be grateful if any one can give me any enlightening solution, possible suggestion or ideas to achieve my desired functions. Thanks a lot

Best regards:

Jurat

r GRanges overlap performance function • 1.4k views
ADD COMMENT
3
Entering edit mode
5.3 years ago

What about the following:

> intersect(intersect(gr1, gr2), gr3)
GRanges object with 6 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 4,  6]      *
  [2]     chr1  [15, 15]      *
  [3]     chr1  [19, 19]      *
  [4]     chr1  [28, 30]      *
  [5]     chr1  [37, 41]      *
  [6]     chr2  [92, 96]      *
  -------
  seqinfo: 4 sequences from an unspecified genome; no seqlengths

Just to make sure that the order of the intersections does not matter:

> intersect(intersect(gr3, gr2), gr1)
GRanges object with 6 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 4,  6]      *
  [2]     chr1  [15, 15]      *
  [3]     chr1  [19, 19]      *
  [4]     chr1  [28, 30]      *
  [5]     chr1  [37, 41]      *
  [6]     chr2  [92, 96]      *
  -------
  seqinfo: 4 sequences from an unspecified genome; no seqlengths
ADD COMMENT

Login before adding your answer.

Traffic: 2262 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6