Hi- Before I reinvent the wheel... I need to modify a set of reference protein sequences in fasta format according to given variants in HGVS format.
Can you suggest any tool or package to do it? A Python package would be best but any would do (R, Java, standalone...). The hgvs package seems promising but from a quick glance I'm not sure it can do it...
Example: given sequence
>ENSP00000454338
FPIEAGDSRGLAAAPESQDSPEAVATEHNPVSGPCRASISPGRFVAALDATA
And variant:
ENSP00000454338.1:p.Ala37Thr
Return the input with Ala at position 37 changed to Thr:
>ENSP00000454338
FPIEAGDSRGLAAAPESQDSPEAVATEHNPVSGPCRTSISPGRFVAALDATA
This should be easy to code for most cases but some may be tricky, e.g. Met178ArgfsTer153
It does seem weird that they don't have it already, but it should be pretty straightforward (I think) to code using their package. I'd recommend getting
c.
from thep.
and making a change in the cDNA sequence for all changes whereposedit.length
> 1 (indels, fs, etc).