Avoid intersection of adjacent regions in GenomicRanges::intersect()
1
1
Entering edit mode
5.1 years ago
david.mas ▴ 20

Hi,

I may have missed something but,

I have 2 ranges objects like this:

suppressPackageStartupMessages(library(GenomicRanges))
#> Warning: package 'S4Vectors' was built under R version 3.5.1

rg1 = GenomicRanges::GRanges(seqnames = c(1,1,2),
                             IRanges(start = c(1,10,1),
                                     end = c(9,19,20))
                             )
rg1
#> GRanges object with 3 ranges and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        1       1-9      *
#>   [2]        1     10-19      *
#>   [3]        2      1-20      *
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

rg2 = GenomicRanges::GRanges(seqnames = c(1),
                             IRanges(start = c(5),
                                     end = c(15))
)

rg2
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        1      5-15      *
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Then I want to intesect them by maintaining the adjacent intances in the original rg1 into different instances in the result object

The default behaviour merge them in the same instance:

rg_intersect = GenomicRanges::intersect(rg1,rg2)

rg_intersect
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        1      5-15      *
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

I tried to merge after applying a findOverlaps method, because the result of the FO method only gives 2 hits so I would expect that it also gives 2 instances. However, it also merges the 2 things.

findOverlapPairs(rg1,rg2)
#> Pairs object with 2 pairs and 0 metadata columns:
#>           first    second
#>       <GRanges> <GRanges>
#>   [1]     1:1-9    1:5-15
#>   [2]   1:10-19    1:5-15

GenomicRanges::intersect(findOverlapPairs(rg1,rg2))
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        1      5-15      *
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

Created on 2019-03-25 by the reprex package (v0.2.0).

R • 1.5k views
ADD COMMENT
2
Entering edit mode
5.1 years ago
david.mas ▴ 20

Just found that I should be able to do it with the pintersect function from a Pair object. In my example it works but I am not familiar with this method so I am not sure if it would be general enough.

suppressPackageStartupMessages(library(GenomicRanges))
#> Warning: package 'S4Vectors' was built under R version 3.5.1

rg1 = GenomicRanges::GRanges(seqnames = c(1,1,2),
                             IRanges(start = c(1,10,1),
                                     end = c(9,19,20))
)

rg2 = GenomicRanges::GRanges(seqnames = c(1),
                             IRanges(start = c(5),
                                     end = c(15))
)

GenomicRanges::pintersect(findOverlapPairs(rg1,rg2))
#> GRanges object with 2 ranges and 1 metadata column:
#>       seqnames    ranges strand |       hit
#>          <Rle> <IRanges>  <Rle> | <logical>
#>   [1]        1       5-9      * |      TRUE
#>   [2]        1     10-15      * |      TRUE
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

Created on 2019-03-25 by the reprex package (v0.2.0).

ADD COMMENT

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6