Question: Evaluate overlap of regions
1
gravatar for franc.jian
22 months ago by
franc.jian40
franc.jian40 wrote:

Dear all,

I am struggling with what should probably be a trivial problem. I have 2 seg files with the following structure:

chr    start   end  feature.1  feature.2 
1       1000  1200   0.5          0.001


chr    start   end  feature.1  feature.2  
1       1100  1400   0.7          0.004

Let's call them A and B. I want to obtain a unique file like this:

chr    start   end     feature.1.a      feature.1.b
1       1000    1100     0.5             NA
1       1100    1200       0.5            0.7
1       1200    1400     NA             0.7

If a region is covered by only A or B, then it would contain only the corresponding values and it would have NA in the other fields. Otherwise, I want to retrieve both of them. This looks close to what bedtools -intersect does, and indeed I am trying to use the package Bedr to solve this task, but it seems to leave out regions covered only in one file. Maybe GenomicRanges is another possibility. If you have any hint on this it would be very helpful to me. Thank you

R genome • 450 views
ADD COMMENTlink modified 22 months ago by benformatics1.9k • written 22 months ago by franc.jian40
1

See if this example in this answer helps:

A: Struggling to filter data with R, tidyverse

ADD REPLYlink written 22 months ago by zx87549.6k

Thank you, even though the operation is not symmetrical it definitely helped out with my problem. I can add it as a solution myself or you can add it and I will close the post. Thank you again.

ADD REPLYlink modified 22 months ago • written 22 months ago by franc.jian40

Feel free to answer your own question, and accept as answer.

ADD REPLYlink written 22 months ago by zx87549.6k
1

There's a --partition option in BEDOPS bedops that solves this pretty easily. Some people don't seem to like using this kit for some reason, but if you're interested, let me know and I'll put together an answer.

ADD REPLYlink modified 22 months ago • written 22 months ago by Alex Reynolds30k
2
gravatar for franc.jian
22 months ago by
franc.jian40
franc.jian40 wrote:

I managed to deal with the issue using data.table::foverlaps. I report an example from another thread.

A: Struggling to filter data with R, tidyverse

The solution needs to be adapted but essentially solves the problem.

ADD COMMENTlink written 22 months ago by franc.jian40
1
gravatar for benformatics
22 months ago by
benformatics1.9k
ETH Zurich
benformatics1.9k wrote:

Given two GenomicRanges objects a and b.

library(GenomicRanges)
a <- GRanges("chr1:1000-1200")
b <- GRanges("chr1:1100-1400")
a$feature.1 <- 0.5
b$feature.1 <- 0.7

get.overlap.info <- function(a,b){
   overlap.r <- intersect(a,b)
   different.r1 <- setdiff(a,b)
   different.r2 <- setdiff(b,a)
   different.r <- c(different.r1,different.r2)
   full.r <- c(different.r,overlap.r)
   ol.a <- findOverlaps(full.r,a)
   ol.b <- findOverlaps(full.r,b)
   full.r$feature.1.a <- NA
   full.r$feature.1.b <- NA
   full.r$feature.1.a[queryHits(ol.a)] <- a$feature.1[subjectHits(ol.a)]
   full.r$feature.1.b[queryHits(ol.b)] <- b$feature.1[subjectHits(ol.b)]
   return(full.r)
}

new.ranges <- get.overlap.info(a,b)
new.ranges
GRanges object with 3 ranges and 2 metadata columns:
      seqnames    ranges strand | feature.1.a feature.1.b
         <Rle> <IRanges>  <Rle> |   <numeric>   <numeric>
  [1]     chr1 1000-1099      * |         0.5        <NA>
  [2]     chr1 1201-1400      * |        <NA>         0.7
  [3]     chr1 1100-1200      * |         0.5         0.7
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENTlink written 22 months ago by benformatics1.9k

you could also use sort() on that final object so that it is sorted

ADD REPLYlink written 22 months ago by benformatics1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour