You haven't said what the error was with your code, but the following subtly modified version works for me:
Input:
>Pep1
PPP
>Pep2
MPG
Code:
import sys
from Bio import SeqIO
from itertools import product
amino_x = {'M': ['GCU', 'GCC', 'GCA'],
'P': ['UGU','UGC'],
'G': ['AAA']}
for protein in SeqIO.parse(sys.argv[1], "fasta"):
for i in protein:
amino_acids = []
for residue in protein:
amino_acids.append(amino_x[residue])
print("{} -> {}".formatprotein.id, amino_acids))
for a in product(*amino_acids):
print(a)
Yields:
Pep1 -> [['UGU', 'UGC'], ['UGU', 'UGC'], ['UGU', 'UGC']]
('UGU', 'UGU', 'UGU')
('UGU', 'UGU', 'UGC')
('UGU', 'UGC', 'UGU')
('UGU', 'UGC', 'UGC')
('UGC', 'UGU', 'UGU')
('UGC', 'UGU', 'UGC')
('UGC', 'UGC', 'UGU')
('UGC', 'UGC', 'UGC')
Pep2 -> [['GCU', 'GCC', 'GCA'], ['UGU', 'UGC'], ['AAA']]
('GCU', 'UGU', 'AAA')
('GCU', 'UGC', 'AAA')
('GCC', 'UGU', 'AAA')
('GCC', 'UGC', 'AAA')
('GCA', 'UGU', 'AAA')
('GCA', 'UGC', 'AAA')
Is that not what you were looking for?
Just FYI, for a long sequence, this back translation is going to get huge.
For a peptide of just 19 residues, the run time is already over 5 seconds (there is a reason BioPython doesn't offer back translation natively):
real 0m5.169s
user 0m0.568s
sys 0m0.376s
For 30 residues this climbs to over 48 minutes! - and I have no idea how much longer it would have gone on for, I killed it after I got bored!
File "backtranslate.py", line 17, in <module>
print(a)
KeyboardInterrupt
real 48m41.443s
user 9m37.704s
sys 6m38.621s
And this only considering 3 amino acids with a total of 6 codons. This will simply be incalculable if you want to do anything more complicated.
Please correct you title, I believe you mean How to back back translate the protein into RNA sequence with python?
Show the code you wrote so far.
ok
My mock protein are in fasta file called HIJ2. fasta. Protein sequences are MPG and MPPP.
This code is with list.
Sorry, I tried to put the code inside the code kernel but this doesn't work.
Like this?
Output:
HIJ2.fasta
Can this code work for longer protein sequences? And how can I add some code that will only print out, lets say 3 rna sequences and not all of them.