Hi,
I am struggling with the seemingly very simple task to map genomic coordinates to a gene symbol (such as TP53).
This is what I did: I've read a bedgraph file containing read coverage information into R using rtracklayer's import function.
The result is the following:
> some_granges_object
GRanges with 667858 ranges and 1 metadata column:
seqnames ranges strand | score
<rle> <iranges> <rle> | <numeric>
[1] chr1 [x, x] * | 155
[2] chr1 [x, x] * | 154
[3] chr1 [x, x] * | 235
[4] chr1 [x, x] * | 237
[5] chr1 [x, x] * | 236
... ... ... ... ... ...
[667854] chrUn_gl000241 [x, x] * | 224
[667855] chrUn_gl000241 [x, x] * | 210
[667856] chrUn_gl000241 [x, x] * | 204
[667857] chrUn_gl000241 [x, x] * | 209
[667858] chrUn_gl000241 [x, x] * | 160
---
seqlengths:
chrM chr1 ... chrUn_gl000249
NA NA ... NA
In this particular sample there are only a few regions covered by reads (maybe 100 genes?) and I would like to infer the gene names as soon as I have a region above a certain coverage value (lets say 150).
Ideally I would like to add a few more columns to the GRanges object with an Entrez gene Id and a common gene symbol.
I've tried to use VariantAnnotation, GenomicFeatures and am stuck.
Are there any real coordinates in that GRanges object? All I can see in your example is [x, x].
Yes, I just removed the coordinates for privacy reasons...
I don't think the coordinates would be giving away too much information, to be honest :)