Hi,
I have a list of mutations using the protein sequence as a reference, e.g. JAK1 G1097D. I would like to obtain the genome coordinates of the actual base pairs encoding the affected codon. Ideally, all I have to do is supply the gene identifier (including the organism) and the mutation and in return I get the genome coordinates.
I can think up a couple of ways to address this problem, but I'm sure there must be a solution out there already -- my web searching has failed me so far, so please do share your bookmarks!
Thanks!
thanks for the hint! I seem to struggle with the input though, e.g. none of these have produced any results for the above mentioned mutation of JAK1. I've tried all combinations of ENST* or ENSG* identifiers, the notation of the mutation with
1097Gly>Asp
, which seemed to me the one in the examples, as well asGly1097Asp
, which seemed to be the one outlined at the SVN page. Using the single-letter notations for the amino acids also didn't work.ENST00000342505:p.1097Gly>Asp
ENSG00000162434:p.Gly1097Asp
I've tried this via the website using the default settings. Any additional hints?
HGVS notation for protein is
:p.3-letter_amino_acid+position+3-letter_amino_acid
, ie :p.Gly1097Asp
. To use this you need a protein ID (eg NP or ENSP) or a protein name.The following work:
JAK1:p.Gly1097Asp
ENSP00000343204:p.Gly1097Asp
If you use a gene ID it doesn't work at all, if you're using a transcript ID it expects CDS coordinates.
of course, using an actual protein ID makes sense! :) thanks for pointing that out!