Hello we have identified new mutations in genomic sequence of ABCD1 gene.I have cDNA mutation results which has mentioned the nucleotides changes like c.562 C>A. we need to convert our results to protein sequence in order to perform multiple bioinformatic analysis on its protein.can you please introduce my a database shows that a mutation causes a frameshift or not? Also please introduce me a database which shows proper variant nomenclature. I Know HGVS, it only offers instructions. is there a database which can help in variant nomenclature in case of proteins? I will be so pleased if you could introduce me a book or article which has instructions about DNA to protein conversion. I am really stuck in this simple conversion!!!!
Here's a simple workflow for converting all your sequences (I have no idea about variant nomeclature databases or any of that).
You can use your WT sequence, and my code here to generate all of the FASTA sequences that correspond to your mutations.
You'll need a 'map file' which lists the Sequence ID, and the switch that's made:
You'll need to convert your format
c.562 C>A to
SequenceID,C562A (for example).
It will generate a mutated fasta sequence for each input sequence/mutation.
You can then use this BioPython snippet to read in a file of mutated sequences and translate them to proteins.
from Bio import SeqIO r = SeqIO.parse('single.fasta' , 'fasta') for s in r: s.translate() SeqRecord(seq=Seq('MSTTADQIAVQYPIPTYRFVVTIGDEQMCFQSVSGLDISYDTIEYRDGVGNWLQ...FH*', HasStopCodon(ExtendedIUPACProtein(), '*')), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=)
You can also use tblastx to search your cDNA query using translated nucleotide database to findout any paralogs or orthologs that match with your wild or mutant sequences. https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=tblastx&PAGE_TYPE=BlastSearch&BLAST_SPEC=&LINK_LOC=blasttab