Question: How to show all columns after performing GenomicRanges::subsetByOverlaps
1
gravatar for gundalav
9 months ago by
gundalav260
La La Land
gundalav260 wrote:

I have the following two GenomicRanges objects. First one gr1 is this:


library(GenomicRanges)
set.seed(1)
gr1 <- GRanges(
        seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
        ranges=IRanges(1:10, width=10:1, names=head(letters,10)),
        strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
        motif_score=seq(1, 0, length=10),
         motif_name=paste0("Motif_", toupper(sample(c(letters,letters))))[1:10]
      )
gr1
#> GRanges object with 10 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   a     chr1  [ 1, 10]      - |   1.0000000     Motif_N
#>   b     chr2  [ 2, 10]      + |   0.8888889     Motif_S
#>   c     chr2  [ 3, 10]      + |   0.7777778     Motif_C
#>   d     chr2  [ 4, 10]      * |   0.6666667     Motif_S
#>   e     chr1  [ 5, 10]      * |   0.5555556     Motif_J
#>   f     chr1  [ 6, 10]      + |   0.4444444     Motif_Q
#>   g     chr3  [ 7, 10]      + |   0.3333333     Motif_R
#>   h     chr3  [ 8, 10]      + |   0.2222222     Motif_D
#>   i     chr3  [ 9, 10]      - |   0.1111111     Motif_B
#>   j     chr3  [10, 10]      - |   0.0000000     Motif_C
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

And the second object gr2:

gr2 <- GRanges(seqnames="chr2", 
               ranges=IRanges(4:3, 6),
               peak_name=c("peak_1", "peak_2"),
               strand="+", peak_score=5:4)
gr2
#> GRanges object with 2 ranges and 2 metadata columns:
#>       seqnames    ranges strand |   peak_name peak_score
#>          <Rle> <IRanges>  <Rle> | <character>  <integer>
#>   [1]     chr2    [4, 6]      + |      peak_1          5
#>   [2]     chr2    [3, 6]      + |      peak_2          4
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Then I perform the region overlap using subsetByOverlaps between gr1 and gr2

 subsetByOverlaps(gr1, gr2)
#> GRanges object with 3 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   b     chr2   [2, 10]      + |   0.8888889     Motif_S
#>   c     chr2   [3, 10]      + |   0.7777778     Motif_C
#>   d     chr2   [4, 10]      * |   0.6666667     Motif_S
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

as you can see the peak_name and peak_score column does not appear after intersection. How can I show them all?

ADD COMMENTlink modified 9 months ago by Sean Davis25k • written 9 months ago by gundalav260
0
gravatar for Sean Davis
9 months ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

The subsetByOverlaps method finds the subset of the first argument that overlaps the second. It does not do a "merge" or anything like that, so columns in the first argument are the only ones returned. If you swap gr1 and gr2, you'll get the subset of gr2 that overlaps gr1 and that will include the peak_score and peak_name columns (but not the motif_score or motif_name columns).

If, as I suspect, you want to know what regions overlap with what other regions, see the findOverlaps method.

ADD COMMENTlink written 9 months ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 732 users visited in the last hour