Got an error while Translating a mRNA sequence into a protein
2
1
Entering edit mode
16 months ago
caro-ca ▴ 20

Hello! I am trying to translate an RNA sequence into a protein by using a dictionary. If the codon is not found in the dictionary, I used get() to get the alternative option of the value. However, I got a KeyError while running my code like this: python3 translate_rna.py "AUGUNCGGU". Could you tell me what is wrong? Thank you for your help.

  def translate_rna(mRNA):
        """Return a translated sequence from an mRNA sequence 

        mRNA -- str, mRNA sequence
        """
        dict_amino_codons = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
                            "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
                            "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
                            "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
                            "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
                            "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
                            "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
                            "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
                            "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
                            "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
                            "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
                            "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
                            "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
                            "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
                            "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
                            "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

        complete_protein_seq = ""       #String of the complete protein sequence
        last_codon_start = len(mRNA) -2       #Any sequence length is analyzed 

        for letter in range(0, last_codon_start, 3):
            mrna_codon = mRNA[letter:letter+3]
            if dict_amino_codons[mrna_codon] == "STOP":
                break
            else:
                complete_protein_seq += dict_amino_codons.get(mrna_codon, 'X')          #complete_protein_seq += dict_amino_codons[mrna_codon]
        print(complete_protein_seq)

    if __name__ == '__main__':
        input_file = argv[1] 
        translate_rna(input_file)

This is the error given:

Traceback (most recent call last):
  File "translate_rna.py", line 46, in <module>
    translate_rna(input_file)
  File "translate_rna.py", line 38, in translate_rna
    if dict_amino_codons[mrna_codon] == "STOP":
KeyError: 'UNC'
python dictionary • 536 views
ADD COMMENT
1
Entering edit mode

The problem is you aren't accounting for the situation whereby an unknown codon is passed to your STOP check. The get() line isn't actually the line throwing you the error.

When your code looks for UNC in the dictionary, it finds neither the codon, nor a stop codon, and has no default value (unlike your .get()) so it breaks.

Basically you just need to wrap if dict_amino_codons[mrna_codon == "STOP" with some logic for handling ambiguous codons, this is probably a good use for a try/except block.

ADD REPLY
0
Entering edit mode

Thank you for your help. On the other hand, I don't understand why the default of get() is not working as (copied from a website) get() "returns a value for the given key. If key is not available then returns default value".

ADD REPLY
1
Entering edit mode

How do you know get() isn't working? Your code is failing before it even gets to that line. As far as I can tell, the get(key, default) approach should work fine but you have to address the upstream error.

ADD REPLY
0
Entering edit mode

Thank you for your help. I will correct it.

ADD REPLY
0
Entering edit mode

Thank you for your help. However, I faced another problem described below. Hope you could help me out. Thanks

ADD REPLY
1
Entering edit mode
16 months ago
Joe 19k

Since the only options are:

  • Codon is present in dict -> retrieve value
  • Codon is ambiguous (not present in duct) -> insert an X
  • Codon is a STOP

the simplest solution is just not to poll the dictionary for the STOP again, and check on the growing string:


import sys

def translate_rna(mRNA):
    """Return a translated sequence from an mRNA sequence

    mRNA -- str, mRNA sequence
        """
    dict_amino_codons = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
                         "UCU":"S", "UCC":"S", "UCA":"S", "UCG":"S",
                         "UAU":"Y", "UAC":"Y", "UAA":"*", "UAG":"*",
                         "UGU":"C", "UGC":"C", "UGA":"*", "UGG":"W",
                         "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
                         "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
                         "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
                         "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
                         "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
                         "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
                         "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
                         "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
                         "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
                         "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
                         "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
                         "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G"}

    complete_protein_seq = ""       #String of the complete protein sequence
    last_codon_start = len(mRNA) -2       #Any sequence length is analyzed

    for letter in range(0, last_codon_start, 3):
        mrna_codon = mRNA[letter:letter+3]
        complete_protein_seq += dict_amino_codons.get(mrna_codon, 'X')
        if complete_protein_seq.endswith("*"):
            break

    print(complete_protein_seq)

if __name__ == '__main__':
    input_file = sys.argv[1]
    translate_rna(input_file)
ADD COMMENT
0
Entering edit mode

I switched to using * because it's less confusing than "STOP" in a sequence, given that S, T and P are valid amino acids, but feel free to change it back.

ADD REPLY
0
Entering edit mode

Yes, at the end I don't print the word STOP, but good point. Thanks.

ADD REPLY
0
Entering edit mode
16 months ago
caro-ca ▴ 20

Thank you again for your help. I also tried your suggestion by applying try/except and it weirdly returned the translated protein but with no "X". How could it be?

def translate_rna(mRNA):
        """Return a translated sequence from a mRNA sequence 

        mRNA -- str, mRNA sequence
        """
        dict_amino_codons = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
                            "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
                            "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
                            "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
                            "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
                            "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
                            "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
                            "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
                            "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
                            "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
                            "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
                            "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
                            "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
                            "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
                            "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
                            "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

        complete_protein_seq = ""       #String of the complete protein sequence
        last_codon_start = len(mRNA) -2       #Any sequence length is analyzed 

        for letter in range(0, last_codon_start, 3):
            mrna_codon = mRNA[letter:letter+3]
            try:
                if dict_amino_codons[mrna_codon] == "STOP":
                    break
            except:
                continue 
            else:
                complete_protein_seq += dict_amino_codons.get(mrna_codon, 'X')    
        print(complete_protein_seq)

    if __name__ == '__main__':
        input_file = argv[1] 
        translate_rna(input_file)

Stdout:

python3 translate_rna.py "AUGUNCGGUUAG"
MG
ADD COMMENT

Login before adding your answer.

Traffic: 1786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6