Question: Finding overlapping ranges in R
2
gravatar for EVR
2.5 years ago by
EVR510
Earth
EVR510 wrote:

Hi,

I have a set of intervals in a data frame and a query interval range. All I want to find the interval ranges that not only overlap with query ranges but also subsequent ranges. For an example, consider the data frame like follows:

df=data.frame(Id=rep("A1",23),start=c(11176,11176,11176,11176,11176,11176,11176,11177,11177,11177,11177,11177,11177,11178,11178,11179,11179,11179,11233,11233,11233,11233,11233),end=11205,11206,11206,11206,11206,11206,11207,11206,11206,11208,11206,11208,11209,11206,11206,11203,11204,11204,11263,11263,11263,11263,11264))

If my query range interval is 11176 and 11205. Then in the data frame df, I would like find the intervals that overlap my query interval range and also intervals that overlap the overlapping intervals of query range.

Below is my R code but for some reasons it is not giving me the output I desire. I expect the output 11179 and 11204 but some how my code is outputting only the range 11178 and 11206.

temp_start= 11176
temp_end=11206
for(i in 1:dim(df)[1])
{
  final_start=temp_start
  final_end=temp_end
 if((findInterval(final_end,c(df$start[i],df$end[i]),rightmost.closed = T,left.open = T)==1L) || (findInterval(final_start,c(df$start[i],df$end[i]),rightmost.closed = T,left.open = T)==1L))
   {
    final_start=df$start[i]
    final_end=df$end[i]
    print(final_start)
    print(final_end)
      } 
}

The above code take the query_start(11176) and query_end(11206) as input. Later I check either the temp_start or temp_end must be be within the ranges of the interval ranges in data frame df. If it is then this interval range is taken and being checked whether this interval's range start or end must be within the range of next interval range in for loop.

Any guidance would be highly appreciated. thanks in advance.

rna-seq overlapping-ranges R • 2.4k views
ADD COMMENTlink modified 2.5 years ago by poisonAlien2.6k • written 2.5 years ago by EVR510

There is a typo in your first example, you should add "=c(" after "end" in the declaration of df. Moreover this dataframe seems to contain many identical duplicated entries. In any case I would recommend you GRanges for working with genomic ranges.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Giovanni M Dall'Olio26k
1
gravatar for Michael Dondrup
2.5 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

Check out the IRanges and GRanges packages in R.

See also Partial or complete overlap of two genomic ranges

Then in the data frame df, I would like find the intervals that overlap my query interval range and also intervals that overlap the overlapping intervals of query range.

This can be achieved by running the findoverlaps query against itself, then iterating over the result and generating the 1. order self-overlapping extension of the query by computing the normalized intervals for each query and the self overlap.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Michael Dondrup45k
0
gravatar for H.Hasani
2.5 years ago by
H.Hasani630
Freiburg, Germany
H.Hasani630 wrote:

Similar to IRanges and GRanges, you can try genomeIntervals

ADD COMMENTlink written 2.5 years ago by H.Hasani630
0
gravatar for Giovanni M Dall'Olio
2.5 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

As suggested by others, use GRanges for genomic ranges intersections.

> df=data.frame(Id=rep("A1",23),start=c(11176,11176,11176,11176,11176,11176,11176,11177,11177,11177,11177,11177,11177,11178,11178,11179,11179,11179,11233,11233,11233,11233,11233),end=c(11205,11206,11206,11206,11206,11206,11207,11206,11206,11208,11206,11208,11209,11206,11206,11203,11204,11204,11263,11263,11263,11263,11264))
> gr = makeGRangesFromDataFrame(df, seqnames.field="Id")
> gr
GRanges object with 23 ranges and 0 metadata columns:
       seqnames         ranges strand
          <Rle>      <IRanges>  <Rle>
   [1]       A1 [11176, 11205]      *
   [2]       A1 [11176, 11206]      *
   [3]       A1 [11176, 11206]      *
   [4]       A1 [11176, 11206]      *
   [5]       A1 [11176, 11206]      *
   ...      ...            ...    ...
  [19]       A1 [11233, 11263]      *
  [20]       A1 [11233, 11263]      *
  [21]       A1 [11233, 11263]      *
  [22]       A1 [11233, 11263]      *
  [23]       A1 [11233, 11264]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> gr %>% unique
GRanges object with 11 ranges and 0 metadata columns:
       seqnames         ranges strand
          <Rle>      <IRanges>  <Rle>
   [1]       A1 [11176, 11205]      *
   [2]       A1 [11176, 11206]      *
   [3]       A1 [11176, 11207]      *
   [4]       A1 [11177, 11206]      *
   [5]       A1 [11177, 11208]      *
   [6]       A1 [11177, 11209]      *
   [7]       A1 [11178, 11206]      *
   [8]       A1 [11179, 11203]      *
   [9]       A1 [11179, 11204]      *
  [10]       A1 [11233, 11263]      *
  [11]       A1 [11233, 11264]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seq

> query.gr = GRanges("A1", IRanges(start=11176, end=11205))
> subsetByOverlaps(gr, uniquequery.gr))
GRanges object with 18 ranges and 0 metadata columns:
       seqnames         ranges strand
          <Rle>      <IRanges>  <Rle>
   [1]       A1 [11176, 11205]      *
   [2]       A1 [11176, 11206]      *
   [3]       A1 [11176, 11206]      *
   [4]       A1 [11176, 11206]      *
   [5]       A1 [11176, 11206]      *
   ...      ...            ...    ...
  [14]       A1 [11178, 11206]      *
  [15]       A1 [11178, 11206]      *
  [16]       A1 [11179, 11203]      *
  [17]       A1 [11179, 11204]      *
  [18]       A1 [11179, 11204]      *
ADD COMMENTlink written 2.5 years ago by Giovanni M Dall'Olio26k
0
gravatar for poisonAlien
2.5 years ago by
poisonAlien2.6k
Asgard
poisonAlien2.6k wrote:

GRanges and IRanges are okay if your data is small. But its too slow in case of larger datasets !

Use foverlaps from data.table if your data is huge. Its crazy fast

ADD COMMENTlink written 2.5 years ago by poisonAlien2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1521 users visited in the last hour