Question: Computing Relative Coordinates Of One Bioconductor Granges Object To Another?
2
gravatar for Ryan Thompson
6.9 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

I have a set of mapped reads as a GRanges object in BioConductor, and I am grouping them into "clusters" of overlapping reads (regions of contiguous nonzero coverage). I also have the cluster coordinates as a GRanges object, obtained by calling reduce on an unstranded version of the reads, and then arbitrarily assigning a strand to each cluster (actually I have a method for assigning a specific strand to each, but it's not important).

library(ShortRead)
bamfile <- file.path(system.file("extdata", package="Rsamtools"), "ex1.bam")
read.ranges <- as(readAligned(bamfile, type="BAM"), "GRanges")
cluster.ranges <- reduce(GRanges(seqnames=seqnames(read.ranges),
                                 ranges=ranges(read.ranges),
                                 ## Don't consider strand when merging
                                 ## reads into clusters
                                 strand="*",
                                 seqlengths=seqlengths(read.ranges)))
## Each cluster has a strand, assigned via an unspecified method
strand(cluster.ranges) <- c("+", "-")
## Assign an arbitrary ID to each cluster
elementMetadata(cluster.ranges)$id <- sprintf("cluster%s", seq(length(cluster.ranges)))

So basically, I have two GRanges objects, read.ranges and cluster.ranges, and every range in the first obejct is contained in exactly one range in the second object:

> read.ranges
GRanges with 3242 ranges and 2 elementMetadata cols:
         seqnames       ranges strand   |                       id      flag
            <Rle>    <IRanges>  <Rle>   |             <BStringSet> <integer>
     [1]     seq1     [ 1, 36]      +   |      B7_591:4:96:693:509        73
     [2]     seq1     [ 3, 37]      +   |   EAS54_65:7:152:368:113        73
     [3]     seq1     [ 5, 39]      +   |      EAS51_64:8:5:734:57       137
     [4]     seq1     [ 6, 41]      +   |     B7_591:1:289:587:906       137
     [5]     seq1     [ 9, 43]      +   |    EAS56_59:8:38:671:758       137
     [6]     seq1     [13, 47]      +   |    EAS56_61:6:18:467:281        73
     [7]     seq1     [13, 48]      +   |  EAS114_28:5:296:340:699       137
     [8]     seq1     [15, 49]      +   |     B7_597:6:194:894:408        73
     [9]     seq1     [18, 52]      -   |    EAS188_4:8:12:628:973        89
     ...      ...          ...    ... ...                      ...       ...
  [3234]     seq2 [1520, 1554]      +   |  EAS114_45:6:86:859:1779       137
  [3235]     seq2 [1523, 1555]      -   |   EAS54_71:8:105:854:975        83
  [3236]     seq2 [1524, 1558]      -   |   EAS51_62:4:187:907:145       153
  [3237]     seq2 [1524, 1557]      +   |   EAS54_71:4:284:269:882        73
  [3238]     seq2 [1524, 1558]      +   |     EAS56_63:4:141:9:811       137
 [ reached getOption("max.print") -- omitted 4 rows ]
  ---
  seqlengths:
   seq1 seq2
     NA   NA
> cluster.ranges
GRanges with 2 ranges and 1 elementMetadata col:
      seqnames    ranges strand |          id
         <Rle> <IRanges>  <Rle> | <character>
  [1]     seq1 [1, 1569]      + |    cluster1
  [2]     seq2 [1, 1567]      - |    cluster2
  ---
  seqlengths:
   seq1 seq2
     NA   NA

From this, I would like to transform the coordinates of the reads into coordinates relative to the clusters to which they belong. This includes inverting the strand and reversing coordinates of reads within a cluster whose strand is "-", Is there a function to do this?

bioconductor coordinates • 2.2k views
ADD COMMENTlink written 6.9 years ago by Ryan Thompson3.4k
2
gravatar for Malcolm.Cook
6.9 years ago by
Malcolm.Cook1.0k
kansas, usa
Malcolm.Cook1.0k wrote:

I don't think there is a function to do exactly that.

If I understand the specifics of your strand considerations, the following should be close

# find which cluster each range is within, ignoring strand
hits<-cluster.ranges[findOverlaps(read.ranges,cluster.ranges,select='first',ignore.strand=TRUE)]
# start with the read.ranges themselves
read.ranges.on.hits<-read.ranges
# update each range relative to its cluster hit
ranges(read.ranges.on.hits)<-ranges(narrow(hits,start(read.ranges),end(read.ranges)))
# identify which hits are to clusters on negative strand
neg<-as.vector(strand(hits)=='-')
# reflect the coordinates and the strand of the reads in neg strand clusters
ranges(read.ranges.on.hits)[neg]<-reflect(ranges(read.ranges.on.hits[neg]),ranges(hits[neg]))
strand(read.ranges.on.hits)[neg]<-ifelse('-'==strand(read.ranges.on.hits)[neg],'+','-')

which yields

> read.ranges.on.hits
GRanges with 3242 ranges and 2 elementMetadata values:
         seqnames    ranges strand   |                       id      flag
            <Rle> <IRanges>  <Rle>   |             <BStringSet> <integer>
     [1]     seq1  [ 1, 36]      +   |      B7_591:4:96:693:509        73
     [2]     seq1  [ 3, 37]      +   |   EAS54_65:7:152:368:113        73
     [3]     seq1  [ 5, 39]      +   |      EAS51_64:8:5:734:57       137
     [4]     seq1  [ 6, 41]      +   |     B7_591:1:289:587:906       137
     [5]     seq1  [ 9, 43]      +   |    EAS56_59:8:38:671:758       137
     [6]     seq1  [13, 47]      +   |    EAS56_61:6:18:467:281        73
     [7]     seq1  [13, 48]      +   |  EAS114_28:5:296:340:699       137
     [8]     seq1  [15, 49]      +   |     B7_597:6:194:894:408        73
     [9]     seq1  [18, 52]      -   |    EAS188_4:8:12:628:973        89
     ...      ...       ...    ... ...                      ...       ...
  [3234]     seq2  [14, 48]      -   |  EAS114_45:6:86:859:1779       137
  [3235]     seq2  [13, 45]      +   |   EAS54_71:8:105:854:975        83
  [3236]     seq2  [10, 44]      +   |   EAS51_62:4:187:907:145       153
  [3237]     seq2  [11, 44]      -   |   EAS54_71:4:284:269:882        73
  [3238]     seq2  [10, 44]      -   |     EAS56_63:4:141:9:811       137
  [3239]     seq2  [10, 44]      -   |  EAS114_30:6:277:397:932        73
  [3240]     seq2  [ 6, 40]      +   | EAS139_11:7:50:1229:1313        83
  [3241]     seq2  [ 2, 36]      +   |    EAS54_65:3:320:20:250       147
  [3242]     seq2  [ 1, 35]      +   |    EAS114_26:7:37:79:581        83
  ---
  seqlengths:
   seq1 seq2
     NA   NA
ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Malcolm.Cook1.0k

Oops, I forgot to mention the part about giving each cluster a unique ID and then using that as the seqnames of the new GRanges object, but that's easy to do myself.

ADD REPLYlink written 6.9 years ago by Ryan Thompson3.4k

It looks to me like you already have already given each cluster a unique ID.

To use it as the seqnames of the result, just initialize read.ranges.on.hits instead with:

 read.ranges.on.hits<-GRanges(values(hits)$id,ranges(read.ranges),strand(read.ranges))
ADD REPLYlink written 6.9 years ago by Malcolm.Cook1.0k

Oh yeah, I guess I did.

ADD REPLYlink written 6.9 years ago by Ryan Thompson3.4k

ok - so - do you accept the answer as complete? if so, then click the green arrow! Cheers, malcolm.

ADD REPLYlink written 6.9 years ago by Malcolm.Cook1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1122 users visited in the last hour