Entering edit mode
5.4 years ago
alexille5640
▴
10
I am curious about the reliability of Uniprot amino acid sequences?
Say I want to do the following:
- Reverse translate a reviewed Uniprot amino acid sequence into a nucleotide sequence (using Sequence Manipulation Suite Reverse Translate)
- Synthesize that nucleotide sequence in vitro into an expression vector plasmid
- Transfect cells with that plasmid for expression
Is it expected that a protein with the exact same amino acid sequence will be expressed by the cells?
UniProtKB
encompasses several individual protein sequence resources that are depicted on this page. If you are talking about a sequence that is fromSwissProt
(manually reviewed/curated sequences) or UniRef100 clusters then that sequence is likely perfectly accurate. Every SwissProt record should have a corresponding nucleotide entry so you should not need to do any sequence manipulation (but there may be an exon/intron model to consider).That said it is unclear what is the aim of this question. Do you wish to express a protein that is not normally present in that host?
Sorry for my lack of knowledge, but how do you access the corresponding nucleotide entry of for example: Mus musculus Actin Beta entry from uniprot (https://www.uniprot.org/uniprot/P60710).
If you scroll down to the
Sequence Databases
section on the page you linked you will find this information.Yes (for SwissProt), assuming ceteris paribus. Question is, can we ensure ceteris paribus?
How do you propose to reverse translate the amino acid sequence? The number of resulting possibilities are astronomical...just using the most frequent codon or something?
Yes, relying on the degeneracy of codons, I'm assuming "Sequence Manipulation Suite: Reverse Translate" is sufficient.
What do you mean by that?
The degeneracy, used to describe how one Amino Acid can be encoded by multiple codons.
I am aware of the concept of codon degeneracy. I was asking what you meant by "relying on" codon degeneracy for picking the most frequent codon. Without codon degeneracy one wouldn't encounter any form of frequency, so I was trying to understand the point you were making.
I have a somewhat related question in that I'm wondering if there are web based tools available to generate 100s to 1000s of DNA sequences from a single protein sequence? There are plenty of codon optimization tools, but I want to generate many nt sequences starting from the same protein sequence to analyze potential DNA secondary structure variants.
I don't know about web-based back translation, but I have a tool here that might be of interest:
https://github.com/jrjhealey/bioinfo-tools/blob/master/backtranslate.py
Note, this doesn't generate all possible sequences (since its an astronomical number of sequences), its just a useful way of showing which codons are the most redundant in a given sequence.