Mutate protein sequence according to HGVS
1
1
Entering edit mode
4.8 years ago

Hi- Before I reinvent the wheel... I need to modify a set of reference protein sequences in fasta format according to given variants in HGVS format.

Can you suggest any tool or package to do it? A Python package would be best but any would do (R, Java, standalone...). The hgvs package seems promising but from a quick glance I'm not sure it can do it...

Example: given sequence

>ENSP00000454338
FPIEAGDSRGLAAAPESQDSPEAVATEHNPVSGPCRASISPGRFVAALDATA

And variant:

ENSP00000454338.1:p.Ala37Thr

Return the input with Ala at position 37 changed to Thr:

>ENSP00000454338
FPIEAGDSRGLAAAPESQDSPEAVATEHNPVSGPCRTSISPGRFVAALDATA

This should be easy to code for most cases but some may be tricky, e.g. Met178ArgfsTer153

HGVS protein fasta • 1.1k views
ADD COMMENT
2
Entering edit mode

It does seem weird that they don't have it already, but it should be pretty straightforward (I think) to code using their package. I'd recommend getting c. from the p. and making a change in the cDNA sequence for all changes where posedit.length > 1 (indels, fs, etc).

ADD REPLY
2
Entering edit mode
4.8 years ago
Emily 23k

There's a Protein Sequence plugin for the Ensembl VEP. Run the VEP with HGVS input and this plugin to get the sequences in your output. It's open source, so you can borrow the code and do it your own way too.

ADD COMMENT

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6