Entering edit mode
4.4 years ago
arseru
▴
20
Hi everyone,
I'm having trouble trying to merge the metadata from 3 different GRanges object that fall in the same range. I'll give an example to better illustrate my problem. Each theoretical "tool" here finds a hit in the same genomic range (chr1:103-105):
library(GenomicRanges)
gr1 <- GRanges(
seqnames = "chr1",
ranges = IRanges(103, 105),
strand = "*",
tool1 = "Hit",
tool2 = NA,
tool3 = NA
)
gr2 <- GRanges(
seqnames = "chr1",
ranges = IRanges(103, 105),
strand = "*",
tool1 = NA,
tool2 = "Hit",
tool3 = NA
)
gr3 <- GRanges(
seqnames = "chr1",
ranges = IRanges(103, 105),
strand = "*",
tool1 = NA,
tool2 = NA,
tool3 = "Hit"
)
merged = sort(sortSeqlevels(c(gr1, gr2, gr3)))
The result of this is the following merged object:
GRanges object with 3 ranges and 3 metadata columns:
seqnames ranges strand | tool1 tool2 tool3
<Rle> <IRanges> <Rle> | <character> <character> <character>
[1] chr1 103-105 * | Hit <NA> <NA>
[2] chr1 103-105 * | <NA> Hit <NA>
[3] chr1 103-105 * | <NA> <NA> Hit
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
What I would like to have is a single line with the metadata merged, like this:
GRanges object with 1 ranges and 1 metadata columns:
seqnames ranges strand | tool1 tool2 tool3
<Rle> <IRanges> <Rle> | <character> <character> <character>
[1] chr1 103-105 * | Hit Hit Hit
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Is that possible? Thank you very much for your help!
Is it always only the choice between
Hit
andNA
or can other characters/names can come in?The choice is between
NA
and any other string, not necessarilyHit
.Maybe some information is required as to what exactly you're trying to accomplish. If the ranges will be identical between each GRanges object you are creating, then you already have a set of reference ranges on which to score each tool, and each GRanges instance contains the results for that tool, so you can (1) aggregate your tool results in a master set in a more traditional way: refSet$tool1 <- gr1$tool1, refSet$tool2 <- gr2$tool2, etc. If the ranges will be different between your GRanges instances, and you're trying to aggregate how tools hit regions (whether or not they overlap or are even present between instances), then you'll have to (2) create a reference set by reducing the ranges, and then scoring how your tools hit those regions. Are your regions always identical between instances?
What I want to accomplish is a merged GRanges with the hits found for the 3 different tools, intersected with a reference BED file to "unify" coordinates. So the thing is that the hits found will ultimately be identical between the tools if they found the hit in the same region, but different tools also report hits in different regions that will not be common between them. So when concatenating them, the tools that have reported identical regions will not be merging their metadata into a single range; rather they will have duplicated lines, similar to the example output that I showed in my first message. And that last detail is the one I'm not able to solve :)