Hg19 Coordinate To Cds Position
3
5
Entering edit mode
9.5 years ago
Halit ▴ 90

Hi,

Given a hg19 genomic coordinate, is there any easy way to map this coordinate to the position in the coding sequence of the respective reference sequence?

Example: Chr1, hg19 position = 12049235 -> UCSC knownGene id: 12312, CDS position: 14, Amino acid position: 5

Thanks

mapping genomics coordinates cds position • 7.3k views
0
Entering edit mode

up vote for AA position.

8
Entering edit mode
9.5 years ago
Martin Morgan ★ 1.6k

For what it's worth in R / Bioconductor load these packages

library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

(use source("http://bioconductor.org/biocLite.R"); biocLite(c("VariantAnnotation", "TxDb.Hsapiens.UCSC.hg19.knownGene")) the first time, to install these packages). Then create a convenient short-cut to the annotations, and a GRanges representing regions of interest (this could be millions of elements long, or derived from a VCF file)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
roi <- GRanges("chr1", IRanges(12049235, width=1))

then locate the variants in the context of the annotation

loc <- refLocsToLocalLocs(roi, txdb)

The return value tells us that it hits two CDS, at the 10th nucleotide / 4th protein location of each.

> loc
GRanges with 2 ranges and 5 elementMetadata cols:
seqnames               ranges strand |    cdsLoc              proteinLoc
<Rle>            <IRanges>  <Rle> | <IRanges> <CompressedIntegerList>
[1]     chr1 [12049235, 12049235]      * |  [10, 10]                       4
[2]     chr1 [12049235, 12049235]      * |  [10, 10]                       4
queryID        txID     cdsID
<integer> <character> <integer>
[1]         1         650      1992
[2]         1         651      1992
---
seqlengths:
chr1
NA

To find out the gene to which the transcript belongs,

> select(txdb, key=values(loc)\$txID, cols="GENEID", keytype="TXID")
TXID GENEID
1  650   9927
2  651   9927

See ?refLocsToLocalLocs, ?select for some additional help. With

library(org.Hs.eg.db)

one could go on to discover

> select(org.Hs.eg.db, "9927", c("GENENAME", "SYMBOL"))
ENTREZID    GENENAME SYMBOL
1     9927 mitofusin 2   MFN2

See the Annotation and Variants work flows.

0
Entering edit mode

Just for anyone catching this thread now the refLocsToLocalLocs function is no longer part of the VariantAnnotation package. Shame, really.

best,

duff

0
Entering edit mode

@Duff But see GenomicAlignments::mapToAlignments and GenomicFeatures::mapToTranscripts.

0
Entering edit mode

@Martin Morgan Indeed and very useful. What I wanted was the old PROTEINLOC column as well. However I found the refLocsToLocalLocs code on github and adapted that the relevant code in that. Nice!

Cheers

0
Entering edit mode

Hi!

I'm dealing with the same issue, could you please write how you adapted the code of mapToTranscript to get the PROTEINLOC? and how can I refer the code to my own dataset (I have hundreds of differentially expressed exons with their genomic coordinates in a csv file)? Thanks!

0
Entering edit mode

@Duff I did find the code for refLocsToLocalLocs on github, but am having trouble sourcing the code since I'm pretty noob. Any advice would be greatly appreciated. I tried > script < getURL('https://raw.githubusercontent.com/Bioconductor/VCF_projects/master/VariantAnnotation/R/methods-refLocsToLocalLocs.R', ssl.verifypeer = FALSE)

eval(parse(text = script)) Error in setMethod("refLocsToLocalLocs", signature("GRanges", "TranscriptDb", : no existing definition for function ‘refLocsToLocalLocs’  since it appears I need the "whole package." Couldn't get VariantAnnotation 1.5.26 to install from github, and failed at using an old R (3.0.0) and building up an R library to install VariantAnnotation from https://bioarchive.galaxyproject.org/#/pkg/VariantAnnotation/1.5.26/

My problem is exactly as above, given an (hg38) coordinate, find the CDS position.

If you're still watching this and can help, my sincere thanks!

1
Entering edit mode
9.5 years ago

The following java file contains (among other things ) the code to loop over the exons and finds the position in the protein: https://github.com/lindenb/jsandbox/blob/master/src/sandbox/VCFAnnotator.java

see the classes KnownGene and PredictionAnnotator

0
Entering edit mode
9.5 years ago
deanna.church ★ 1.1k

If you only need to do this for a few positions, you can do this interactively in the NCBI Sviewer: http://www.ncbi.nlm.nih.gov/nuccore/NC_000019.9?report=graph

Select the 'Tools' menu and select the 'Markers' option. This will produce a new dialog- put your base of interest into the dialog. If you wish to give the marker a name you can do this. Select 'OK'. When you mouse over the marker in the 'ruler' you will get a context menu. Select 'Marker Details' This will produce a table provide the relative location of that base in all coordinate systems (genomic, transcript and protein) that are available for that point.

To great programmatically, but OK if you just need to look up a few positions.