Question: GRanges : setdiff and keep extra columns
0
gravatar for Nicolas Rosewick
2.5 years ago by
Belgium, Brussels
Nicolas Rosewick8.6k wrote:

Hi,

I've two GRanges as :

> gene.pos.gr
GRanges object with 63677 ranges and 1 metadata column:
                seqnames               ranges strand | ensembl_gene_id
                   <Rle>            <IRanges>  <Rle> |     <character>
      [1]    HG991_PATCH [66119285, 66465398]      + | ENSG00000261657
      [2]             13 [23551994, 23552136]      - | ENSG00000223116
      [3]             13 [23708313, 23708703]      + | ENSG00000233440
      [4]             13 [23726725, 23726825]      - | ENSG00000207157
      [5]             13 [23743974, 23744736]      - | ENSG00000229483
      ...            ...                  ...    ... .             ...
  [63673] HSCHR17_1_CTG1     [ 62293, 253878]      - | ENSG00000262334
  [63674] HSCHR17_1_CTG1     [189016, 190077]      + | ENSG00000262737
  [63675] HSCHR17_1_CTG1     [198829, 201112]      + | ENSG00000263267
  [63676] HSCHR17_1_CTG1     [220262, 225205]      + | ENSG00000262336
  [63677] HSCHR17_1_CTG1     [222361, 224506]      + | ENSG00000262005
  -------
  seqinfo: 265 sequences from an unspecified genome; no seqlengths

and

> exon.pos.gr
GRanges object with 738009 ranges and 1 metadata column:
                 seqnames               ranges strand | ensembl_gene_id
                    <Rle>            <IRanges>  <Rle> |     <character>
       [1]    HG991_PATCH [66119285, 66119659]      + | ENSG00000261657
       [2]    HG991_PATCH [66298434, 66298819]      + | ENSG00000261657
       [3]    HG991_PATCH [66314236, 66314392]      + | ENSG00000261657
       [4]    HG991_PATCH [66320895, 66321004]      + | ENSG00000261657
       [5]    HG991_PATCH [66339743, 66339847]      + | ENSG00000261657
       ...            ...                  ...    ... .             ...
  [738005] HSCHR17_1_CTG1     [200680, 201068]      + | ENSG00000263267
  [738006] HSCHR17_1_CTG1     [220262, 220486]      + | ENSG00000262336
  [738007] HSCHR17_1_CTG1     [224683, 225205]      + | ENSG00000262336
  [738008] HSCHR17_1_CTG1     [222361, 223093]      + | ENSG00000262005
  [738009] HSCHR17_1_CTG1     [224327, 224506]      + | ENSG00000262005
  -------
  seqinfo: 265 sequences from an unspecified genome; no seqlengths

I want to do the difference between them so I used difference <- setdiffgene.pos.gr,exon.pos.gr) It gives me :

> difference
GRanges object with 290454 ranges and 0 metadata columns:
                 seqnames                 ranges strand
                    <Rle>              <IRanges>  <Rle>
       [1]    HG991_PATCH   [66119660, 66298433]      +
       [2]    HG991_PATCH   [66298820, 66314235]      +
       [3]    HG991_PATCH   [66314393, 66320894]      +
       [4]    HG991_PATCH   [66321005, 66339286]      +
       [5]    HG991_PATCH   [66339343, 66339742]      +
       ...            ...                    ...    ...
  [290450] HSCHR2_2_CTG12 [149868059, 149870556]      +
  [290451] HSCHR2_2_CTG12 [149870662, 149872728]      +
  [290452] HSCHR2_2_CTG12 [149872946, 149874163]      +
  [290453] HSCHR2_2_CTG12 [149874278, 149882769]      +
  [290454] HSCHR2_2_CTG12 [149882977, 149885671]      +
  -------
  seqinfo: 265 sequences from an unspecified genome; no seqlengths

How to keep the extra columns from the gene.pos.gr GRanges, especially the ensembl_gene_id column

Thanks

genomicranges • 2.6k views
ADD COMMENTlink modified 16 months ago by Kevin Blighe53k • written 2.5 years ago by Nicolas Rosewick8.6k

I am not sure if you can do that directly using any of the set functions (intersect, union etc) of GRanges. The alternative is to findOverlap between your gene.pos.gr and difference, and using that, get the required metadata and add it to the difference. For example, see https://stat.ethz.ch/pipermail/bioconductor/2013-August/054486.html

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Santosh Anand5.0k
3
gravatar for Kevin Blighe
16 months ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

Just stumbled upon this because I am working with GenomicRanges. All that you need to do is find the overlaps and then use the hits as negative indices.

require(GenomicRanges)
gr1 <- makeGRangesFromDataFrame(
    data.frame(
        chr=rep('chr1',3),
        start=c(1,100,500),
        end=c(50,200,1000),
        gene=c("ENSG1","ENSG2","ENSG3")),
        keep.extra.columns=TRUE)

gr2 <- makeGRangesFromDataFrame(
    data.frame(
        chr=rep('chr1',3),
        start=c(100,500,1001),
        end=c(200,1000,1050),
        gene=c("ENSG2","ENSG3","ENSG4")),
        keep.extra.columns=TRUE)

gr1
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1      1-50      * |    ENSG1
  [2]      chr1   100-200      * |    ENSG2
  [3]      chr1  500-1000      * |    ENSG3

gr2
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1   100-200      * |    ENSG2
  [2]      chr1  500-1000      * |    ENSG3
  [3]      chr1 1001-1050      * |    ENSG4

Then subset the original objects with the negative indices of the overlaps:

gr1[-queryHits(findOverlaps(gr1, gr2, type="any")),] 
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1      1-50      * |    ENSG1

gr2[-queryHits(findOverlaps(gr2, gr1, type="any")),]
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1 1001-1050      * |    ENSG4

Merge the non-overlapping segments together:

c(
    gr1[-queryHits(findOverlaps(gr1, gr2, type="any")),],
    gr2[-queryHits(findOverlaps(gr2, gr1, type="any")),]
)

GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1      1-50      * |    ENSG1
  [2]      chr1 1001-1050      * |    ENSG4

Kevin

ADD COMMENTlink modified 16 months ago • written 16 months ago by Kevin Blighe53k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1612 users visited in the last hour