GRanges : setdiff and keep extra columns
1
0
Entering edit mode
6.8 years ago

Hi,

I've two GRanges as :

> gene.pos.gr
GRanges object with 63677 ranges and 1 metadata column:
                seqnames               ranges strand | ensembl_gene_id
                   <Rle>            <IRanges>  <Rle> |     <character>
      [1]    HG991_PATCH [66119285, 66465398]      + | ENSG00000261657
      [2]             13 [23551994, 23552136]      - | ENSG00000223116
      [3]             13 [23708313, 23708703]      + | ENSG00000233440
      [4]             13 [23726725, 23726825]      - | ENSG00000207157
      [5]             13 [23743974, 23744736]      - | ENSG00000229483
      ...            ...                  ...    ... .             ...
  [63673] HSCHR17_1_CTG1     [ 62293, 253878]      - | ENSG00000262334
  [63674] HSCHR17_1_CTG1     [189016, 190077]      + | ENSG00000262737
  [63675] HSCHR17_1_CTG1     [198829, 201112]      + | ENSG00000263267
  [63676] HSCHR17_1_CTG1     [220262, 225205]      + | ENSG00000262336
  [63677] HSCHR17_1_CTG1     [222361, 224506]      + | ENSG00000262005
  -------
  seqinfo: 265 sequences from an unspecified genome; no seqlengths

and

> exon.pos.gr
GRanges object with 738009 ranges and 1 metadata column:
                 seqnames               ranges strand | ensembl_gene_id
                    <Rle>            <IRanges>  <Rle> |     <character>
       [1]    HG991_PATCH [66119285, 66119659]      + | ENSG00000261657
       [2]    HG991_PATCH [66298434, 66298819]      + | ENSG00000261657
       [3]    HG991_PATCH [66314236, 66314392]      + | ENSG00000261657
       [4]    HG991_PATCH [66320895, 66321004]      + | ENSG00000261657
       [5]    HG991_PATCH [66339743, 66339847]      + | ENSG00000261657
       ...            ...                  ...    ... .             ...
  [738005] HSCHR17_1_CTG1     [200680, 201068]      + | ENSG00000263267
  [738006] HSCHR17_1_CTG1     [220262, 220486]      + | ENSG00000262336
  [738007] HSCHR17_1_CTG1     [224683, 225205]      + | ENSG00000262336
  [738008] HSCHR17_1_CTG1     [222361, 223093]      + | ENSG00000262005
  [738009] HSCHR17_1_CTG1     [224327, 224506]      + | ENSG00000262005
  -------
  seqinfo: 265 sequences from an unspecified genome; no seqlengths

I want to do the difference between them so I used difference <- setdiffgene.pos.gr,exon.pos.gr) It gives me :

> difference
GRanges object with 290454 ranges and 0 metadata columns:
                 seqnames                 ranges strand
                    <Rle>              <IRanges>  <Rle>
       [1]    HG991_PATCH   [66119660, 66298433]      +
       [2]    HG991_PATCH   [66298820, 66314235]      +
       [3]    HG991_PATCH   [66314393, 66320894]      +
       [4]    HG991_PATCH   [66321005, 66339286]      +
       [5]    HG991_PATCH   [66339343, 66339742]      +
       ...            ...                    ...    ...
  [290450] HSCHR2_2_CTG12 [149868059, 149870556]      +
  [290451] HSCHR2_2_CTG12 [149870662, 149872728]      +
  [290452] HSCHR2_2_CTG12 [149872946, 149874163]      +
  [290453] HSCHR2_2_CTG12 [149874278, 149882769]      +
  [290454] HSCHR2_2_CTG12 [149882977, 149885671]      +
  -------
  seqinfo: 265 sequences from an unspecified genome; no seqlengths

How to keep the extra columns from the gene.pos.gr GRanges, especially the ensembl_gene_id column

Thanks

GenomicRanges • 7.3k views
ADD COMMENT
0
Entering edit mode

I am not sure if you can do that directly using any of the set functions (intersect, union etc) of GRanges. The alternative is to findOverlap between your gene.pos.gr and difference, and using that, get the required metadata and add it to the difference. For example, see https://stat.ethz.ch/pipermail/bioconductor/2013-August/054486.html

ADD REPLY
4
Entering edit mode
5.6 years ago

Just stumbled upon this because I am working with GenomicRanges. All that you need to do is find the overlaps and then use the hits as negative indices.

require(GenomicRanges)
gr1 <- makeGRangesFromDataFrame(
    data.frame(
        chr=rep('chr1',3),
        start=c(1,100,500),
        end=c(50,200,1000),
        gene=c("ENSG1","ENSG2","ENSG3")),
        keep.extra.columns=TRUE)

gr2 <- makeGRangesFromDataFrame(
    data.frame(
        chr=rep('chr1',3),
        start=c(100,500,1001),
        end=c(200,1000,1050),
        gene=c("ENSG2","ENSG3","ENSG4")),
        keep.extra.columns=TRUE)

gr1
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1      1-50      * |    ENSG1
  [2]      chr1   100-200      * |    ENSG2
  [3]      chr1  500-1000      * |    ENSG3

gr2
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1   100-200      * |    ENSG2
  [2]      chr1  500-1000      * |    ENSG3
  [3]      chr1 1001-1050      * |    ENSG4

Then subset the original objects with the negative indices of the overlaps:

gr1[-queryHits(findOverlaps(gr1, gr2, type="any")),] 
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1      1-50      * |    ENSG1

gr2[-queryHits(findOverlaps(gr2, gr1, type="any")),]
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1 1001-1050      * |    ENSG4

Merge the non-overlapping segments together:

c(
    gr1[-queryHits(findOverlaps(gr1, gr2, type="any")),],
    gr2[-queryHits(findOverlaps(gr2, gr1, type="any")),]
)

GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1      1-50      * |    ENSG1
  [2]      chr1 1001-1050      * |    ENSG4

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6