Protein to RNA codons
0
0
Entering edit mode
3.3 years ago

I have this question where we need to write a code that takes a protein fasta file and the protein sequence identifier, and counts all the possible RNA combinations for the sequence in the fasta file, with a condition that the total of combinations should be less than 5000.

I started with making an RNA codons dictionary, then I made a function that puts the elements of the fasta file (amino acids) into a list, then I tried to do combinations from that list, but I get an error and I tried but didn't know where is the problem, if anyone can check the code and tell me whats wrong I would be gratefull

import itertools

RNA_codon_table = {
'A': ('GCU', 'GCC', 'GCA', 'GCG'),
'C': ('UGU', 'UGC'),
'D': ('GAU', 'GAC'),
'E': ('GAA', 'GAG'),
'F': ('UUU', 'UUC'),
'G': ('GGU', 'GGC', 'GGA', 'GGG'),
'H': ('CAU', 'CAC'),
'I': ('AUU', 'AUC', 'AUA'),
'K': ('AAA', 'AAG'),
'L': ('UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG'),
'M': ('AUG',),
'N': ('AAU', 'AAC'),
'P': ('CCU', 'CCC', 'CCA', 'CCG'),
'Q': ('CAA', 'CAG'),
'R': ('CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'),
'S': ('UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'),
'T': ('ACU', 'ACC', 'ACA', 'ACG'),
'V': ('GUU', 'GUC', 'GUA', 'GUG'),
'W': ('UGG',),
'Y': ('UAU', 'UAC'),}

def protein_fasta (protein_file):
  protein_sequence = []
  protein = SeqIO.parse(protein_file, format = 'fasta')
  for Seqrecord in protein: 
     protein_sequence.append(Seqrecord.seq)
  print (protein_sequence)

for seq in protein_sequence:
     codons = [ list(RNA_codon_table[key]) for key in protein_sequence ]
print(list(itertools.product(codons)))

I don't know how to attach a fasta file, but this is the sequence inside :

seq_compl complete sequence IEEATHMTPCYELHGLRWVQIQDYAINVMQCL

this is the error I get:

---------------------------------------------------------------------------
 KeyError                                  Traceback (most recent call last)
<ipython-input-65-3dd46947c505> in <module>
----> 1 all_combinations ('short_protein.fasta')

<ipython-input-64-45a50fffc1d9> in all_combinations(protein_file)
      5        protein_sequence.append(Seqrecord.seq)
      6 
----> 7    codons = [ list(RNA_codon_table[key]) for key in protein_sequence 
]
      8    print(list(itertools.product(codons)))

<ipython-input-64-45a50fffc1d9> in <listcomp>(.0)
      5        protein_sequence.append(Seqrecord.seq)
      6 
----> 7    codons = [ list(RNA_codon_table[key]) for key in protein_sequence 
 ]
      8    print(list(itertools.product(codons)))

 KeyError: Seq('IEEATHMTPCYELHGLRWVQIQDYAINVMQCL')

Thank you

RNA-Seq sequence • 605 views
ADD COMMENT

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6