Question

From amino acid sequence to DNA sequence using Reverse Translate software

0

Entering edit mode

3.9 years ago

Student ▴ 30

Hi. I have Identified amino acid sequence motif using MEME. Now I want to know what is the DNA sequence corresponding to this amino acid sequence: KDEKIKEIFEDLAKEERNHY.

It seems that the only software that does this conversion is this: Reverse Translate. But I do not understand how to set in a proper way the "The default codon usage table" ? The organism that I am studying is Pyrococcus Furiosus, so I think that those parameters from E. coli are no adapt for my purpose...

I thank you in advance.

DNA translation protein coordinate-mapping • 5.1k views

ADD COMMENT • link 3.9 years ago by Student ▴ 30

2

Entering edit mode

3.9 years ago

Shred ★ 1.6k

You're studying an organism which is known to live in extreme conditions: its genome is composed by an higher ratio of GC. Here you could find a codon usage table with a percent of each codon: just select the GCG format and paste into the Reverse Translate web-tool. Keep in mind that reverse-translation isn't an error-free approach: your output sequence could diverge from the original one.

ADD COMMENT • link 3.9 years ago by Shred ★ 1.6k

2

Entering edit mode

3.9 years ago

Carlo Yague 9.0k

The RNA/DNA code is degenerated — multiple codon can encode single amino acids. For instance Alanine can reverse translate into either GCT, GCC, GCA or GCG. In consequence, there are thousands of DNA sequences corresponding to the KDEKIKEIFEDLAKEERNHY amino acid sequence so if you wanted a simple and unique answer, it will not be possible.

Concerning "The default codon usage table", it is the proportion of each codon used in a species of interest. For instance, in E.coli, Aline is encoded more frequently by GCG than by GCT. If you want to use the reverse translate tool on Pyrococcus, it is best to use its own codon usage table that you can find here.

ADD COMMENT • link 3.9 years ago by Carlo Yague 9.0k

score 3 · Accepted Answer · 2021-08-19

3

Entering edit mode

3.9 years ago

Mensur Dlakic ★ 29k

Because of codon degeneracy, there is no way to unambiguously do back-translation from protein sequence to nucleotides. Simply put, we can do DNA -> protein, but not protein -> DNA. One can use the most likely codon for each amino-acid, but those differ between organisms. Again, what you want to do can only be done approximately.

Your best bet would probably be to use tblastn and search your protein sequence against the P. furiosus genome, and you will most likely get the exact genomic match and you will not have to do the back-translation.

ADD COMMENT • link 3.9 years ago by Mensur Dlakic ★ 29k

2

Entering edit mode

I just did a search against the P. furiousus genome and the results are here. Beware that they will expire in two days. There is no exact match for that protein sequence - only 80% identical. You should be able to find your codons in this sequence.

ADD REPLY • link 3.9 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

I did as you suggested in the answer with tblastn (I did not know there was such possibility) and I found the dna sequence by clicking on graphics here, so I found that it is this:GTAGTGAGCCTTTTCCACCTTCGCTAAATCTTCAAATATTTTCTTAATTTTCTCATCTTT. In fact, if I search this into the P. Furiosus genome I find it !

ADD REPLY • link 3.9 years ago by Student ▴ 30