Question: Evaluate overlap of regions
1
gravatar for franc.jian
5 months ago by
franc.jian30
franc.jian30 wrote:

Dear all,

I am struggling with what should probably be a trivial problem. I have 2 seg files with the following structure:

chr    start   end  feature.1  feature.2 
1       1000  1200   0.5          0.001


chr    start   end  feature.1  feature.2  
1       1100  1400   0.7          0.004

Let's call them A and B. I want to obtain a unique file like this:

chr    start   end     feature.1.a      feature.1.b
1       1000    1100     0.5             NA
1       1100    1200       0.5            0.7
1       1200    1400     NA             0.7

If a region is covered by only A or B, then it would contain only the corresponding values and it would have NA in the other fields. Otherwise, I want to retrieve both of them. This looks close to what bedtools -intersect does, and indeed I am trying to use the package Bedr to solve this task, but it seems to leave out regions covered only in one file. Maybe GenomicRanges is another possibility. If you have any hint on this it would be very helpful to me. Thank you

R genome • 225 views
ADD COMMENTlink modified 5 months ago by benformatics700 • written 5 months ago by franc.jian30
1

See if this example in this answer helps:

A: Struggling to filter data with R, tidyverse

ADD REPLYlink written 5 months ago by zx87547.1k

Thank you, even though the operation is not symmetrical it definitely helped out with my problem. I can add it as a solution myself or you can add it and I will close the post. Thank you again.

ADD REPLYlink modified 5 months ago • written 5 months ago by franc.jian30

Feel free to answer your own question, and accept as answer.

ADD REPLYlink written 5 months ago by zx87547.1k
1

There's a --partition option in BEDOPS bedops that solves this pretty easily. Some people don't seem to like using this kit for some reason, but if you're interested, let me know and I'll put together an answer.

ADD REPLYlink modified 5 months ago • written 5 months ago by Alex Reynolds28k
2
gravatar for franc.jian
5 months ago by
franc.jian30
franc.jian30 wrote:

I managed to deal with the issue using data.table::foverlaps. I report an example from another thread.

A: Struggling to filter data with R, tidyverse

The solution needs to be adapted but essentially solves the problem.

ADD COMMENTlink written 5 months ago by franc.jian30
1
gravatar for benformatics
5 months ago by
benformatics700
ETH Zurich
benformatics700 wrote:

Given two GenomicRanges objects a and b.

library(GenomicRanges)
a <- GRanges("chr1:1000-1200")
b <- GRanges("chr1:1100-1400")
a$feature.1 <- 0.5
b$feature.1 <- 0.7

get.overlap.info <- function(a,b){
   overlap.r <- intersect(a,b)
   different.r1 <- setdiff(a,b)
   different.r2 <- setdiff(b,a)
   different.r <- c(different.r1,different.r2)
   full.r <- c(different.r,overlap.r)
   ol.a <- findOverlaps(full.r,a)
   ol.b <- findOverlaps(full.r,b)
   full.r$feature.1.a <- NA
   full.r$feature.1.b <- NA
   full.r$feature.1.a[queryHits(ol.a)] <- a$feature.1[subjectHits(ol.a)]
   full.r$feature.1.b[queryHits(ol.b)] <- b$feature.1[subjectHits(ol.b)]
   return(full.r)
}

new.ranges <- get.overlap.info(a,b)
new.ranges
GRanges object with 3 ranges and 2 metadata columns:
      seqnames    ranges strand | feature.1.a feature.1.b
         <Rle> <IRanges>  <Rle> |   <numeric>   <numeric>
  [1]     chr1 1000-1099      * |         0.5        <NA>
  [2]     chr1 1201-1400      * |        <NA>         0.7
  [3]     chr1 1100-1200      * |         0.5         0.7
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENTlink written 5 months ago by benformatics700

you could also use sort() on that final object so that it is sorted

ADD REPLYlink written 5 months ago by benformatics700
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1914 users visited in the last hour