Why does bacterial codon table translate GTG start codon to V and not M?
3
4
Entering edit mode
2.9 years ago
bioinfo2345 ▴ 40

Consider the following sequence:

>sequence
GTGACCGGCAGCGCGGCCACGATCCGCCCGGCCAAGGCGGCCGATGCGGTCGCGTGGGCG
CAGCTGCGTCTGGGCCTGTGGCCCGATGCCGATGATCCGCTGGAGACGCTGGTGGCGGCG
CTGGCCGAGGACGCAGGTGCGGTTTTCCTGGCGTGTGCAGCGGGTGGCCAGGCGATCGGC
TTCGCCGAAGTGCGCCTGCGCCATGACTACGT


In this bacterial organism, GTG is an alternative start codon. It means that it can initiate translation via an initiator-tRNA that puts in the amino acid Methionine (M) into the protein. However, if GTG occurs inside the sequence, it gets translated to valine (V) as usual.

However, when using programs that involve translating a nucleotide sequence to a amino acid sequence (such as EBI transeq online, command-line transeq or blastx with this sequence against it's translation starting with V), the above sequence will be translated to an amino acid sequence that begins with V regardless if the codon table is standard code or bacterial. In fact, even using command-line transeq option -methionine does not produce the desired M result.

Questions:

Why is GTG at the start of this sequence not translated to M when using the bacterial codon table?

What use is the bacterial codon table then, if it does not have a different behavior in this case (since this is the only major difference between standard or bacterial codon table)?

bacterial codon bacterial code GTG Valine • 7.7k views
0
Entering edit mode

How would the software know that what it was looking at was the first codon of the coding region, and not in the middle?

0
Entering edit mode

They do the same at EBI, they translate the alternative initiation codons as M.

6
Entering edit mode
2.3 years ago
Juke34 ★ 6.5k

I strongly disagree with what says @Joe!!!
Yes the translation can actually occur from pretty much any initial codon but the first AA attached is a Methione

First I was highly surprised that such thing (Non methione at the N-terminus) would havn't reach my hear while it is going against the dogma everyone learn at school.

Then I was septical, how NCBI and EBI (Two eminent infrastructure in the field) could not be aware of it and dealing wrongly with the first AA in their records.

I read-up the publication suggested by @Joe I realised that what you say @Joe is not written in this publication. Below the most important part of the publication looking at the AA attached at the N-terminus

In the Result part:

We used proteolytic digestion and mass spectrometry to determine if translation began at modified start codons for five selected codons (AUC, ACG, CAU, GGA and CGC, please see ‘Materials and Methods’ section). We cloned a 6x-His tag into the C-terminus of these five genes and, following expression and purification, recovered significant amounts of protein. Little to no protein was recovered from the CGC culture, as expected. We digested proteins with AspN and analyzed the mixture via mass spectrometry. Each expressed protein released peptides of intact N-termini that included an N-terminal methionine. [...] In cultures with ACG as the start codon a small fraction of spectra (1 of 8) indicated that the N-terminal peptide might be the cognate amino acid, threonine (Mr = 119), with a mass shift of −30 Da relative to methionine (Mr = 149) (Supplementary Tables S8 and 11). Other researchers have also observed methionine in the N-terminal position of proteins whose translation initiates from GUG or UUG start codons.

In the Discussion part:

We observed evidence of translation initiation with N-terminal methionine from four codons (AUC, ACG, CAU and GGA), and with the N-terminal cognate amino acid in one spectrum from one codon (ACG) (Supplementary Tables S7–11). In the spectra in which we observed N-terminal methionine, it is likely that tRNAfMet is the initiating tRNA. We did not perform comprehensive mass spectrometry experiments to identify the N-terminal amino acid from the remaining codons, so we cannot be certain from which codon, with which tRNA and with which amino acid, translation is initiating.

Almost all E. coli genes with non-AUG start codons initiate with methionine as the N-terminal amino acid (4,6,7,65,66,87,88), and such events are not considered to be errors in translation initiation. By this same logic, we argue that translation initiation of genes with other non-AUG codons, in which methionine is observed as the N-terminal amino acid, should also not be considered an error.

To conclude, the paper shows that AUC (originally Ile), ACG (originally Thr), CAU (originally His), GGA (originally Gly) and CGC (originally Arg) codons attach a N-terminal methionine and they say that other papers show the same for GUG (originally Val) or UUG (originally Leu).

=> Non-methionine codons usually code for their corresponding AA, but when they act as START codons they are substituted by a Methionine.
=> EBI and NCBI are correct
=> @bioinfo2345 your GTG (V) is correct to be a M when recorded in EBI/NCBI DB. But when using a translation tool, the tool doesn't know if your sequence is complete or not (That the first codon is really a start codon). Thus by default it will just translate the corresponding AA. This is the case for most of the tool and you see the same in bioperl and biopython.
=> The only thing I might agree with @Joe is that it seems methionine isn’t required 100%. In the publication they say that for ACG as start codon, a small portion of spectra (1 of 8) indicated that the N-terminal peptide might be the cognate amino acid, threonine. As everything in biology, it is never 100%. But it is shown only for Threonine.

0
Entering edit mode

I think I may have misspoke, the point I was trying to make was not that the translations start with different amino acids, but that prototypical methonionine codons are not required for the incorporation of a leading Methionine (and therefore it is not necessarily wrong to have a gene start with a codon you wouldn’t expect, and when simple translation rules are applied, you get proteins that appear to start with different amino acids).

0
Entering edit mode

What you say now is correct, but It is not what you said before

The tRNA for valine would recognise the codon as usual, not the fMet tRNA, so its not methionine being placed in to the first position at all.

that was wrong. Because non-canonical start codons usually code for amino acids other than methionine, but when they act as START codons they code for Met (in Eukaryote) and fMet (in Prokaryote).

1
Entering edit mode

Not always. As you so kindly quoted, that same study did find non Methionine amino acids (threonine) incorporated in the start position, though more commonly M, as you point out. Thus what I originally said is not actually wrong.

I would be willing to bet right now that, given how variable biology is, that it is valid for any number of amino acids to start a peptide, and our “proteins start with atg -> M“ paradigm is simply not true in all cases, even if its true most of the time.

I accept the point youre making, but what I said was not technically wrong.

1
Entering edit mode

I still disagree.

First, as they say it nicely All biological processes are governed by processes that imply a certain rate of unlikely events, and such unlikely events are often referred to as errors, failures or leaks. So the only putative case they report in their publication is the ACG start codon that seems to code 1/8 of time for Threonine while 7/8 time it is a Methionine. So it could be seen as an biological errors/failures.

I havn't found in literature any clear case about non Methionine N-terminal AA, except when it is excised by post-translational modification.

So if it happens it's extremely rare and been shown only for the ACG start codon (1/8 of time). Probably due to close physicochemical features as the threonine is acognate amino acid.

Thus saying that it is valid for any number of amino acids to start a peptide is wrong... until the opposite is proven.

1
Entering edit mode

How do we know what nature considers a leak? What we consider ‘errors’ are baked in to the fabric of what makes biology what it is - you can’t have evolution without it.

Arguing about how rare it is isn’t the point. I said there are times when start codons that arent typical methionine codons can start a peptide. There’s no false statement in that. I expanded on this to say that that could also mean that that codon doesn’t incorporate a methionine at all, which is also completely true, no matter how rare, as your own quotes from that same paper proved. The reason we may not have found other amino acids at the 1st position could be because its biologically impossible (my experience of wet lab biology makes me think this is very unlikely to be the case, hence my bet), or it could be because we haven’t actually systematically looked yet.

It isn’t wrong until the opposite is proven. It is neither right nor wrong - yet.

I feel like you’re being a bit disingenuous by taking what I said out of context. I said I’m willing to bet that this is the case. Inherent in that statement is that we don’t know yet, and it requires studying further to find out, but my money is on “all proteins, in every organism, in every case, start with Ms” being wrong, just like the central dogma of molecular biology has been proven wrong again and again (Remember when we thought DNA only templated RNA, and never the other way around? Or when we thought proteins always made DNA and never RNA intermediates?). These rules never hold for long.

1
Entering edit mode

I really like when it becomes philosophical :) I shouldn't have put this sentence out of context (but I liked it...). In the paper, they use it to refer to the non-ATG start codons that they see as a potential feature, rather than an error. They are balanced in their interpretation.

Sorry, I didn't want to be disingenuous, I don't get all subtleties in English and I probably misunderstood. It's true that in Biology we are often surprised by new discoveries that change our conceptions.

1
Entering edit mode

Thus saying that it is valid for any number of amino acids to start a peptide is wrong... until the opposite is proven.

Does this sort of counter your own statement?

As everything in biology, it is never 100%. But it is shown only for Threonine.

For the sake of completenes, this is their section in question, as far as I can see:

We used proteolytic digestion and mass spectrometry to determine if translation began at modified start codons for five selected codons (AUC, ACG, CAU, GGA and CGC, please see ‘Materials and Methods’ section). We cloned a 6x-His tag into the C-terminus of these five genes and, following expression and purification, recovered significant amounts of protein. Little to no protein was recovered from the CGC culture, as expected. We digested proteins with AspN and analyzed the mixture via mass spectrometry. Each expressed protein released peptides of intact N-termini that included an N-terminal methionine (Supplementary Tables S7–11). ACG and AUC are one base away from AUG, while GGA and CAU would require two and three concurrent point mutations, respectively, to revert to a canonical start codon. In cultures with ACG as the start codon a small fraction of spectra (1 of 8) indicated that the N-terminal peptide might be the cognate amino acid, threonine (Mr = 119), with a mass shift of −30 Da relative to methionine (Mr = 149) (Supplementary Tables S8 and 11). Other researchers have also observed methionine in the N-terminal position of proteins whose translation initiates from GUG or UUG start codons (4,65,66).

My interpretation of this is that in one out of four start codons they found a remarkable deviation from the rule. To me that reads like a quarter of their cases - so I strongly agree with Juke-34's statement with a minor modification

As everything in biology, it is never 100%. But it has been shown only for Threonine, yet.

Which reads much like the point Joe is trying to make, don't you think?

3
Entering edit mode
2.9 years ago
Joe 19k

Your premise is wrong. It is an alternative start codon, because methionine isn’t required 100% of the time. It is valid for proteins to start with other amino acids, it’s just less common.

The tRNA for valine would recognise the codon as usual, not the fMet tRNA, so its not necessarily methionine being placed in to the first position at all.

In fact, such is biology’s disregard for what few rules we attempt to hold dear, that translation can actually occur from pretty much any initial codon. The degree of translation simply changes.

0
Entering edit mode

In fact, such is biology’s disregard for what few rules we attempt to hold dear, that translation can actually occur from pretty much any initial codon. The degree of translation simply changes.

Thanks for sharing this one. I sort of fundamentally believed in the canonical start codons with exceptions occuring only in somewhat "exotic" organisms.

0
Entering edit mode

Are you sure the first amino acid in these proteins is Valine?

This is the ncbi entry for tuberculosis KatG, and it thinks the start codon is gtg, and the first amino acid is M

https://www.ncbi.nlm.nih.gov/nuccore/NC_000962.3?from=2153889&to=2156111&report=genbank&strand=true

0
Entering edit mode

Good that you have been critical here. The first AA in these proteins is M. And INSDC (NCBI,EBI and DDBJ) are correct when they show M.

0
Entering edit mode

Thank you for your fast reply. It really helped to better understand the situation. Very nice paper you linked as well.

So for clarification, this protein, for instance, is mistranslated?

https://www.ncbi.nlm.nih.gov/nuccore/JQ396378.1?report=fasta (starts with TTG) https://www.ncbi.nlm.nih.gov/protein/AFO09968.1?report=fasta (starts with M)

(...because TTG is translated to L when using transeq with the standard or bacterial codon table and it is not terribly surprising that errors exists in GenBank as it is not manually curated?)

1
Entering edit mode

You need to find out whether the sequence is experimentally determined or not. If it’s just an automatic annotation, it could be right, but could be wrong.

Also bear in mind that there isn’t a one size fits all bacteria codon table. The classic table 11 is essentially determined from E. coli, but frequencies and codon usage can vary between species.

0
Entering edit mode

It appears to be directly submitted and translated with table 11, so perhaps the people who directly submitted it had fallen for the same problem that I had. They saw TTG and thought M, when it is really L.

0
Entering edit mode
2.3 years ago
Juke34 ★ 6.5k

I found this post now, where it is well described: https://biology.stackexchange.com/questions/56939/do-all-proteins-start-with-methionine