Hello everyone,
I am doing some protein engineering and I am looking to write python script (or find one that was previously written) that allows you to identify a singular codon using ambiguous nucleotide bases that would code for the amino acids of interest and limit the number of undesired amino acids.
For instance, If the amino acids at position 1 could be a D or E, the codons available are: D = 'GAC', 'GAT' E = 'GAA', 'GAG'
Therefore, the optimal codon would be GAN.
I am a complete beginner to python, so this may be a non-trivial task. So far I have written some code to back-translate into all available codons for the listed amino acids (see below). I am assuming the next step would be to find nucleotides within that codons that are a perfect match (position 1 and 2 in the example above)? Does anybody know of a script that already exists that can do this? or point me in the direction of where to go from here?
Thanks
import itertools
from Bio.Alphabet import IUPAC
from Bio.Seq import Seq
amino_acid_codon = dict(I=['ATA', 'ATC', 'ATT'], M=['ATG'], T=['ACA', 'ACC', 'ACG', 'ACT'], N=['AAC', 'AAT'], K=['AAA', 'AAG'], L=[ 'CTA', 'CTC', 'CTG', 'CTT','TTA', 'TTG'], P=['CCA', 'CCC', 'CCG', 'CCT'], H=['CAC', 'CAT'], Q=['CAA', 'CAG'],
R=['CGA', 'CGC', 'CGG', 'CGT','AGA', 'AGG'], V=['GTA', 'GTC', 'GTG', 'GTT'], A=['GCA', 'GCC', 'GCG', 'GCT'], D=['GAC', 'GAT'], E=['GAA', 'GAG'], G=['GGA', 'GGC', 'GGG', 'GGT'],
my_prot = Seq("ED", IUPAC.protein)
for protein in my_prot:
print(my_prot)
amino_acids = list()
for value in my_prot:
amino_acids.append(amino_acid_codon[value])
for translate_back in itertools.accumulate(amino_acids):
print(translate_back)