Granges manipulation
1
0
Entering edit mode
8 months ago

Hello,

I would like not to reinvent to wheel, I have a sorted grange of overlapping transcript positions (names). My goal is to aggregate to names for the positions they are overlapping on, while keeping the positions unique to a single transcript.

With an example :

gr <- GRanges(
  seqnames = c("chr1", "chr1", "chr1", "chr1"),
  ranges = IRanges(start = c(50, 75, 80, 85),
                   end = c(110, 90, 110, 120)),
  names = c("id1", "id2", "id3", "id4")
)

The expected output would be something like :

gr_output <- GRanges(
  seqnames = c("chr1", "chr1", "chr1", "chr1","chr1", "chr1"),
  ranges = IRanges(start = c(50, 75, 80, 85, 91, 111),
                   end = c(74, 79, 84, 90, 110, 120)),
  names = c("id1", "id1;id2", "id1;id2;id3", "id1;id2;id3;id4", "id1;id3;id4", "id4")
)

Maybe something with findOverlaps, reduce and aggregate, or summarise ? Or maybe another tool like bedtools ?

granges • 483 views
ADD COMMENT
5
Entering edit mode
8 months ago
Malcolm.Cook ★ 1.5k

you're looking for disjoin

> d<-disjoin(r,with.revmap=TRUE)
> d
GRanges object with 6 ranges and 1 metadata column:
      seqnames    ranges strand |        revmap
         <Rle> <IRanges>  <Rle> | <IntegerList>
  [1]     chr1     50-74      * |             1
  [2]     chr1     75-79      * |           1,2
  [3]     chr1     80-84      * |         1,2,3
  [4]     chr1     85-90      * |     1,2,3,...
  [5]     chr1    91-110      * |         1,3,4
  [6]     chr1   111-120      * |             4
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> d$names<-unlist(lapply(d$revmap,\(i){paste(collapse=';',r$names[i])} ))
> d
GRanges object with 6 ranges and 2 metadata columns:
      seqnames    ranges strand |        revmap           names
         <Rle> <IRanges>  <Rle> | <IntegerList>     <character>
  [1]     chr1     50-74      * |             1             id1
  [2]     chr1     75-79      * |           1,2         id1;id2
  [3]     chr1     80-84      * |         1,2,3     id1;id2;id3
  [4]     chr1     85-90      * |     1,2,3,... id1;id2;id3;id4
  [5]     chr1    91-110      * |         1,3,4     id1;id3;id4
  [6]     chr1   111-120      * |             4             id4
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

Thanks, you nailed it perfectly !

ADD REPLY

Login before adding your answer.

Traffic: 809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6