Reducing and aggregating GRanges with gaps using plyranges.
0
2
Entering edit mode
3.6 years ago

I just started using plyranges, and I can not figure out how to reduce and aggregate a GRanges object with a desired gap width.

Example data.

library("plyranges")

df <- data.frame(
  seqnames="chrI", start=c(1, 10, 20), end=c(5, 15, 25), strand=c("+", "+", "-"),
  score=c(8, 3, 6)
)
gr <- as_granges(df)

> gr
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <integer>
  [1]     chrI       1-5      + |         8
  [2]     chrI     10-15      + |         3
  [3]     chrI     20-25      - |         6
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Desired output with max allowed gap width of 10 and summing the scores for the aggregation in this example.

desired_output <- data.frame(
  seqnames="chrI", start=c(1, 20), end=c(15, 25), strand=c("+", "-"),
  score=c(11, 6)
)
desired_output <- as_granges(desired_output)

> desired_output
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chrI      1-15      + |        11
  [2]     chrI     20-25      - |         6
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

This is similar to section 4.1 in the HelloRanges tutorial, which does work for me since you can set a minimum gap width in the GenomicRanges::reduce function. The plyranges equivalent is reduce_ranges_directed but it does not appear to have a gap width option.

EDIT: This has been cross-posted to bioconductor support also.

R bioconductor GRanges plyranges • 1.6k views
ADD COMMENT
1
Entering edit mode

Will this do?

stretch(anchor_start(gr), extend=10) %>% reduce_ranges_directed(., sum.score = sum(score))
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand | sum.score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chrI      1-25      + |        11
  [2]     chrI     20-35      - |         6
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

EDIT: replaced flank_right with stretch.

But I can see how it might be annoying that the result will have aberrant bp added to the final interval

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6