Retrieve mutation position and ID for a mutation in hgvs format
3
0
Entering edit mode
10.2 years ago
vigprasud ▴ 60

How can I find a mutation's chr pos and id represented in HGVS format?

Eg:

Gene: TMEM231    cdna_Change: NM_001077418.1:c.582+3A>G    protein_change: p.?

The mutations are represented in HGVS format. How and where can I find the rs#, chr and pos for this particular mutation.

I have a set of 10000 mutations and would like to annotate then with their chr, pos and rs#

mutation hgvs annotation dbsnp • 8.7k views
ADD COMMENT
0
Entering edit mode

What programming languages do you know? This could be done in R (and presumably biopython/bioperl) relatively easily.

ADD REPLY
0
Entering edit mode

I know python and R

ADD REPLY
1
Entering edit mode

If VEP doesn't work for you, then you can do this in R. The general steps would be to:

  1. Load this file as a dataframe and parse the cdna information to split the ID from the position information.
  2. Load a txdb that contains these IDs (they don't all).
  3. You can then just apply a function to each transcript to calculate the cDNA position of each exon (you'd just use the 5' or 3' most coordinate).
  4. Now you have numbers you can compare, so you'll need to apply a function to extract the appropriate transcript and then just determine (A) which exon it would be in (or intron following an exon as in your example) and then (B) increment/decrement the genomic position of said exon by the appropriate offset.
ADD REPLY
0
Entering edit mode

The Ensembl VEP should work fine for this, as it does accept HGVS notations on RefSeq transcripts as input. No need for any programming. The documentation for the VEP is excellent!

ADD REPLY
6
Entering edit mode
10.2 years ago

I would try VEP

ADD COMMENT
0
Entering edit mode

I second VEP, it works quite well for this. I forget what the limit is for the number of variants through the online web interface but you can either do it that way in batches or do it through the command-line version. You just have to switch from the default and you can put in HGVS mutations using RefSeq sequences

ADD REPLY
0
Entering edit mode

Thank you. That helps.

ADD REPLY
2
Entering edit mode
8.9 years ago
Reece ▴ 310

Also consider the Python hgvs package. [Disclosure: I'm one of the authors.]

ADD COMMENT
0
Entering edit mode

Well as a non-python user I didn't like this tool initially but given it's well-written documentation, I was able to follow. Posting an example just in case if someone like me is struggling.

I had a NC IG as follows: "NC_000002.11:g.113890610C>T" and I wanted NP IDs for the same

## Initializing hgvs shell
hgvs-shell
 var_g = parse("NC_000002.11:g.113890610C>T") 
transcripts = am37.relevant_transcripts(var_g)

In [27]: for ac in sorted(transcripts):
    ...:     var_t = g_to_t(var_g, ac)
    ...:     var_p = t_to_p(var_t)
    ...:     print("-> " + str(var_t) + " (" + str(var_p) + ") ")

This returned all the NP IDS

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

I see that mutalyzer works for converting rsIDs to HGvs but not the other way around.

ADD REPLY

Login before adding your answer.

Traffic: 1774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6