Convert a FASTA Amino Acid Sequence to RNA (Reverse Central Dogma)
1
0
Entering edit mode
4.7 years ago

I see lots of example on how to convert DNA to RNA or RNA to Amino. I see plenty examples using Python and Biopython. How could I do the reverse? Amino acid sequence to RNA and then RNA to DNA.

#### Define a dict that maps Amino acids to the corresponding codon

AA_codon = {
'C': ['TGT', 'TGC'],
'A': ['GAT', 'GAC'],
'S': ['TCT', 'TCG', 'TCA', 'TCC', 'AGC', 'AGT'],
'G': ['CAA', 'CAG'],
'M': ['ATG'], #Start
'A': ['AAC', 'AAT'],
'P': ['CCT', 'CCG', 'CCA', 'CCC'],
'L': ['AAG', 'AAA'],
'Q': ['TAG', 'TGA', 'TAA'], #Stop
'T': ['ACC', 'ACA', 'ACG', 'ACT'],
'P': ['TTT', 'TTC'],
'A': ['GCA', 'GCC', 'GCG', 'GCT'],
'G': ['GGT', 'GGG', 'GGA', 'GGC'],
'I': ['ATC', 'ATA', 'ATT'],
'L': ['TTA', 'TTG', 'CTC', 'CTT', 'CTG', 'CTA'],
'H': ['CAT', 'CAC'],
'A': ['CGA', 'CGC', 'CGG', 'CGT', 'AGG', 'AGA'],
'T': ['TGG'],
'V': ['GTA', 'GTC', 'GTG', 'GTT'],
'G': ['GAG', 'GAA'],
'T': ['TAT', 'TAC'] }


#### ReverseTranslate(): Read over each character in string & join

sequence Python • 2.3k views
3
Entering edit mode

Basically, it's because there are so many possible combinations, and so much redundancy, that the number of possible sequences that give rise to a particular amino acid, is too big to be useful for anything downstream. It would also be difficut to even represent the data in a useful way.

2
Entering edit mode

You can choose a frequency table, and sort of do a codon optimization. I say sort of, because you won't be taking into account PTMs, or other risks.

0
Entering edit mode

I was under the impression that Biopython had method(s) to work this kind of problem, but I was not considering the "redundancy" of the codon table.

7
Entering edit mode
4.7 years ago
Rob 5.2k

The key point you should note here is that a given codon implies a single amino acid, but a particular amino acid could result from multiple different codons. The reverse operation does not have a unique solution. Consider an amino acid sequence a_1, a_2, ..., a_n, where c_1, c_2, ..., c_n are the number of possible codons for each amino acid in this sequence in turn --- then there are \prod_{i=1}^{n} c_i possible different ways to generate the given amino acid sequence in terms of nucleotide sequences; this is exponential growth, and enumerating all possibilities won't be tractable for even reasonably long sequences.