Question: Annotate Granges Object With Gene-Symbol
1
gravatar for tnw01a
7.0 years ago by
tnw01a10
tnw01a10 wrote:

Hi,

I am struggling with the seemingly very simple task to map genomic coordinates to a gene symbol (such as TP53).

This is what I did: I've read a bedgraph file containing read coverage information into R using rtracklayer's import function.

The result is the following:


> some_granges_object
GRanges with 667858 ranges and 1 metadata column:
                 seqnames             ranges strand   |     score
                    <rle>          <iranges>  <rle>   | <numeric>
       [1]           chr1 [x, x]      *   |       155
       [2]           chr1 [x, x]      *   |       154
       [3]           chr1 [x, x]      *   |       235
       [4]           chr1 [x, x]      *   |       237
       [5]           chr1 [x, x]      *   |       236
       ...            ...                ...    ... ...       ...
  [667854] chrUn_gl000241     [x, x]      *   |       224
  [667855] chrUn_gl000241     [x, x]      *   |       210
  [667856] chrUn_gl000241     [x, x]      *   |       204
  [667857] chrUn_gl000241     [x, x]      *   |       209
  [667858] chrUn_gl000241     [x, x]      *   |       160
  ---
  seqlengths:
                    chrM                  chr1 ...        chrUn_gl000249
                      NA                    NA ...                    NA

In this particular sample there are only a few regions covered by reads (maybe 100 genes?) and I would like to infer the gene names as soon as I have a region above a certain coverage value (lets say 150).

Ideally I would like to add a few more columns to the GRanges object with an Entrez gene Id and a common gene symbol.

I've tried to use VariantAnnotation, GenomicFeatures and am stuck.

bioconductor • 5.6k views
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by tnw01a10

Are there any real coordinates in that GRanges object? All I can see in your example is [x, x].

ADD REPLYlink written 7.0 years ago by Neilfws49k

Yes, I just removed the coordinates for privacy reasons...

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by tnw01a10
1

I don't think the coordinates would be giving away too much information, to be honest :)

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Neilfws49k
1
gravatar for Michael Dondrup
7.0 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Possibly, the most straight-forward way would include the following steps:

  1. Get or make the complete genome annotation as a GFF file with Ensembl GeneIDs (including gene symbols for some genes, if possible) and import via package rtracklayer
  2. calculate overlaps of some_granges_objects_at... with the gene annotation regions using findOverlaps
  3. alternatively: use package biomaRt to map geneIDs to gene symbols, if you didn't include gene symbols in step one

Please let us know if you need more specific help with any of those steps.

ADD COMMENTlink written 7.0 years ago by Michael Dondrup47k

I already tried using findOverlaps:

transcriptsByGene = transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene")
overlaps = findOverlaps(some_granges_object, transcriptsByGene)

But I couldn't figure out what to do with the resulting data structure:

> overlaps
Hits of length 722867
queryLength: 667858
subjectLength: 22932
       queryHits subjectHits 
        <integer>   <integer> 
 1             71         608 
 2             72         608 
 3             73         608 
 4             74         608 
 5             75         608 
ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by tnw01a10

FYI, it looks like some of your queries don't actually overlap a gene. You might want to use nearest() in that case. For extracting the indices from overlaps, you can use queryHits(overlaps) and subjectHits(overlaps) (have a look at help('HitsList-class') if you're unsure what the numbers there mean).

ADD REPLYlink written 7.0 years ago by Devon Ryan97k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1466 users visited in the last hour