Convert a FASTA Amino Acid Sequence to RNA (Reverse Central Dogma)
1
0
Entering edit mode
7.1 years ago

I see lots of example on how to convert DNA to RNA or RNA to Amino. I see plenty examples using Python and Biopython. How could I do the reverse? Amino acid sequence to RNA and then RNA to DNA.

Define a dict that maps Amino acids to the corresponding codon

AA_codon = {
'C': ['TGT', 'TGC'], 
'A': ['GAT', 'GAC'], 
'S': ['TCT', 'TCG', 'TCA', 'TCC', 'AGC', 'AGT'], 
'G': ['CAA', 'CAG'], 
 'M': ['ATG'], #Start
 'A': ['AAC', 'AAT'], 
 'P': ['CCT', 'CCG', 'CCA', 'CCC'], 
 'L': ['AAG', 'AAA'], 
 'Q': ['TAG', 'TGA', 'TAA'], #Stop
 'T': ['ACC', 'ACA', 'ACG', 'ACT'], 
 'P': ['TTT', 'TTC'], 
 'A': ['GCA', 'GCC', 'GCG', 'GCT'], 
 'G': ['GGT', 'GGG', 'GGA', 'GGC'], 
 'I': ['ATC', 'ATA', 'ATT'], 
 'L': ['TTA', 'TTG', 'CTC', 'CTT', 'CTG', 'CTA'], 
 'H': ['CAT', 'CAC'], 
 'A': ['CGA', 'CGC', 'CGG', 'CGT', 'AGG', 'AGA'], 
 'T': ['TGG'], 
 'V': ['GTA', 'GTC', 'GTG', 'GTT'], 
 'G': ['GAG', 'GAA'], 
 'T': ['TAT', 'TAC'] }

ReverseTranslate(): Read over each character in string & join

sequence Python • 3.2k views
ADD COMMENT
3
Entering edit mode

Basically, it's because there are so many possible combinations, and so much redundancy, that the number of possible sequences that give rise to a particular amino acid, is too big to be useful for anything downstream. It would also be difficut to even represent the data in a useful way.

ADD REPLY
2
Entering edit mode

You can choose a frequency table, and sort of do a codon optimization. I say sort of, because you won't be taking into account PTMs, or other risks.

ADD REPLY
0
Entering edit mode

I was under the impression that Biopython had method(s) to work this kind of problem, but I was not considering the "redundancy" of the codon table.

ADD REPLY
7
Entering edit mode
7.1 years ago
Rob 6.9k

The key point you should note here is that a given codon implies a single amino acid, but a particular amino acid could result from multiple different codons. The reverse operation does not have a unique solution. Consider an amino acid sequence a_1, a_2, ..., a_n, where c_1, c_2, ..., c_n are the number of possible codons for each amino acid in this sequence in turn --- then there are \prod_{i=1}^{n} c_i possible different ways to generate the given amino acid sequence in terms of nucleotide sequences; this is exponential growth, and enumerating all possibilities won't be tractable for even reasonably long sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6