Question: GRanges: merge metadata from the same ranges
0
gravatar for arseru
20 days ago by
arseru0
arseru0 wrote:

Hi everyone,

I'm having trouble trying to merge the metadata from 3 different GRanges object that fall in the same range. I'll give an example to better illustrate my problem. Each theoretical "tool" here finds a hit in the same genomic range (chr1:103-105):

library(GenomicRanges)
gr1 <- GRanges(
    seqnames = "chr1",
    ranges = IRanges(103, 105),
    strand = "*",
    tool1 = "Hit",
    tool2 = NA,
    tool3 = NA
)
gr2 <- GRanges(
    seqnames = "chr1",
    ranges = IRanges(103, 105),
    strand = "*",
    tool1 = NA,
    tool2 = "Hit",
    tool3 = NA
)
gr3 <- GRanges(
    seqnames = "chr1",
    ranges = IRanges(103, 105),
    strand = "*",
    tool1 = NA,
    tool2 = NA,
    tool3 = "Hit"
)
merged = sort(sortSeqlevels(c(gr1, gr2, gr3)))

The result of this is the following merged object:

GRanges object with 3 ranges and 3 metadata columns:
      seqnames    ranges strand |       tool1       tool2       tool3
         <Rle> <IRanges>  <Rle> | <character> <character> <character>
  [1]     chr1   103-105      * |         Hit        <NA>        <NA>
  [2]     chr1   103-105      * |        <NA>         Hit        <NA>
  [3]     chr1   103-105      * |        <NA>        <NA>         Hit
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

What I would like to have is a single line with the metadata merged, like this:

GRanges object with 1 ranges and 1 metadata columns:
      seqnames    ranges strand |       tool1       tool2       tool3
         <Rle> <IRanges>  <Rle> | <character> <character> <character>
  [1]     chr1   103-105      * |         Hit        Hit        Hit
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Is that possible? Thank you very much for your help!

granges genomicranges • 85 views
ADD COMMENTlink written 20 days ago by arseru0

Is it always only the choice between Hit and NA or can other characters/names can come in?

ADD REPLYlink written 20 days ago by ATpoint26k

The choice is between NA and any other string, not necessarily Hit.

ADD REPLYlink written 19 days ago by arseru0

Maybe some information is required as to what exactly you're trying to accomplish. If the ranges will be identical between each GRanges object you are creating, then you already have a set of reference ranges on which to score each tool, and each GRanges instance contains the results for that tool, so you can (1) aggregate your tool results in a master set in a more traditional way: refSet$tool1 <- gr1$tool1, refSet$tool2 <- gr2$tool2, etc. If the ranges will be different between your GRanges instances, and you're trying to aggregate how tools hit regions (whether or not they overlap or are even present between instances), then you'll have to (2) create a reference set by reducing the ranges, and then scoring how your tools hit those regions. Are your regions always identical between instances?

ADD REPLYlink written 20 days ago by seidel6.9k

What I want to accomplish is a merged GRanges with the hits found for the 3 different tools, intersected with a reference BED file to "unify" coordinates. So the thing is that the hits found will ultimately be identical between the tools if they found the hit in the same region, but different tools also report hits in different regions that will not be common between them. So when concatenating them, the tools that have reported identical regions will not be merging their metadata into a single range; rather they will have duplicated lines, similar to the example output that I showed in my first message. And that last detail is the one I'm not able to solve :)

ADD REPLYlink written 19 days ago by arseru0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1743 users visited in the last hour