GRanges: merge metadata from the same ranges
0
2
Entering edit mode
4.5 years ago
arseru ▴ 20

Hi everyone,

I'm having trouble trying to merge the metadata from 3 different GRanges object that fall in the same range. I'll give an example to better illustrate my problem. Each theoretical "tool" here finds a hit in the same genomic range (chr1:103-105):

library(GenomicRanges)
gr1 <- GRanges(
    seqnames = "chr1",
    ranges = IRanges(103, 105),
    strand = "*",
    tool1 = "Hit",
    tool2 = NA,
    tool3 = NA
)
gr2 <- GRanges(
    seqnames = "chr1",
    ranges = IRanges(103, 105),
    strand = "*",
    tool1 = NA,
    tool2 = "Hit",
    tool3 = NA
)
gr3 <- GRanges(
    seqnames = "chr1",
    ranges = IRanges(103, 105),
    strand = "*",
    tool1 = NA,
    tool2 = NA,
    tool3 = "Hit"
)
merged = sort(sortSeqlevels(c(gr1, gr2, gr3)))

The result of this is the following merged object:

GRanges object with 3 ranges and 3 metadata columns:
      seqnames    ranges strand |       tool1       tool2       tool3
         <Rle> <IRanges>  <Rle> | <character> <character> <character>
  [1]     chr1   103-105      * |         Hit        <NA>        <NA>
  [2]     chr1   103-105      * |        <NA>         Hit        <NA>
  [3]     chr1   103-105      * |        <NA>        <NA>         Hit
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

What I would like to have is a single line with the metadata merged, like this:

GRanges object with 1 ranges and 1 metadata columns:
      seqnames    ranges strand |       tool1       tool2       tool3
         <Rle> <IRanges>  <Rle> | <character> <character> <character>
  [1]     chr1   103-105      * |         Hit        Hit        Hit
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Is that possible? Thank you very much for your help!

granges genomicranges • 3.1k views
ADD COMMENT
0
Entering edit mode

Is it always only the choice between Hit and NA or can other characters/names can come in?

ADD REPLY
0
Entering edit mode

The choice is between NA and any other string, not necessarily Hit.

ADD REPLY
0
Entering edit mode

Maybe some information is required as to what exactly you're trying to accomplish. If the ranges will be identical between each GRanges object you are creating, then you already have a set of reference ranges on which to score each tool, and each GRanges instance contains the results for that tool, so you can (1) aggregate your tool results in a master set in a more traditional way: refSet$tool1 <- gr1$tool1, refSet$tool2 <- gr2$tool2, etc. If the ranges will be different between your GRanges instances, and you're trying to aggregate how tools hit regions (whether or not they overlap or are even present between instances), then you'll have to (2) create a reference set by reducing the ranges, and then scoring how your tools hit those regions. Are your regions always identical between instances?

ADD REPLY
0
Entering edit mode

What I want to accomplish is a merged GRanges with the hits found for the 3 different tools, intersected with a reference BED file to "unify" coordinates. So the thing is that the hits found will ultimately be identical between the tools if they found the hit in the same region, but different tools also report hits in different regions that will not be common between them. So when concatenating them, the tools that have reported identical regions will not be merging their metadata into a single range; rather they will have duplicated lines, similar to the example output that I showed in my first message. And that last detail is the one I'm not able to solve :)

ADD REPLY

Login before adding your answer.

Traffic: 1330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6