Hi all:
I have a little bit of experience working with GRanges objects in R (from the GenomicRanges package in Bioconductor), but I keep running into a subsetting case that should be more straightforward than the solution I'm using.
Let's say I have the following GRanges object (using the example from the reference):
library(GenomicRanges)
gr2 <- GRanges(seqnames = c("chr1", "chr1"),
ranges = IRanges(c(7,13), width = 3), strand = c("+", "-")) #sample GRanges object
...which looks like this:
gr2
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 7, 9] +
[2] chr1 [13, 15] -
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
I'd like to access all start positions in this object in a strand-specific manner, where I define the "start" to be first value in the IRanges interval if it's on the plus strand, and the second value if it's the negative strand. Of course, the behavior of the subsetting methods start()
and end()
are both agnostic to strand, grabbing all values in the first or second value of the interval, respectively.
For example:
start(gr2)
[1] 7 13
and
end(gr2)
[1] 9 15
My current work-around (which is ugly) looks something like the following:
which(strand(gr2)=="+") -> plus.i #which intervals are on the positive strand?
start(gr2[plus.i]) #getting the strand-specific 'start' from those intervals
[1] 7
which(strand(gr2)=="-") -> minus.i #which intervals are on the negative strand?
end(gr2[minus.i]) #getting the strand-specific 'start' from those intervals
[1] 15
I then concatenate both sets of vectors using c()
.
There must an easier, more GRanges-centric approach to access these strand-specific 'starts'. Can anyone point me in the right direction? The real-world application case I'm dealing with are alignments of mapped, strand-specific CAGE tags. The 5' ends of an interval represents the TSS.
Thanks in advance,
Taylor