How to obtain distinct/uniqe rows from GenomicRanges object
1
1
Entering edit mode
2.6 years ago
gundalav ▴ 360

I have the following GenomicRanges object created with this:

library(GenomicRanges)
gr <- GRanges(seqnames = "chr1", strand = c("+", "-","-", "+"),ranges = IRanges(start = c(1,3,3,5), width = 3))
gr

That looks like this:

GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1       1-3      +
  [2]     chr1       3-5      -
  [3]     chr1       3-5      -
  [4]     chr1       5-7      +

What I want to do is to obtain the unique rows from there, yielding this (hand-coded)

GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1       1-3      +
  [2]     chr1       3-5      -
  [3]     chr1       5-7      +

How can I achieve that? In reality, I have around 9 million rows to process.

I can use this method but very2 slow:

 library(tidyverse)
 gr %>% 
   as.tibble() %>% 
   distinct()
R bioconductor GenomicRange • 2.4k views
ADD COMMENT
2
Entering edit mode
2.6 years ago
zx8754 11k

Use unique as usual (no need for tidyverse):

unique(gr)
# GRanges object with 3 ranges and 0 metadata columns:
#       seqnames    ranges strand
#          <Rle> <IRanges>  <Rle>
#   [1]     chr1       1-3      +
#   [2]     chr1       3-5      -
#   [3]     chr1       5-7      +
#   -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Then convert to data.frame if needed:

data.frame(unique(gr))
#     seqnames start end width strand
#   1     chr1     1   3     3      +
#   2     chr1     3   5     3      -
#   3     chr1     5   7     3      +
ADD COMMENT
1
Entering edit mode

Just be aware that unique() will ignore the data in the GRanges mcols

a_gr <- GRanges(seqnames = 1,
            ranges = IRanges(start=c(1,1),
                             end =c(2,2)), 
            strand=c("+"),
            other=c("a","b"))
a_gr
#GRanges object with 2 ranges and 1 metadata column:
#  seqnames    ranges strand |       other
#   <Rle> <IRanges>  <Rle> | <character>
#[1]        1       1-2      + |           a
#[2]        1       1-2      + |           b

unique(a_gr)
#GRanges object with 1 range and 1 metadata column:
#seqnames    ranges strand |       other
#   <Rle> <IRanges>  <Rle> | <character>
#[1]        1       1-2      + |           a
ADD REPLY

Login before adding your answer.

Traffic: 2443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6