Question

Reading frames in python

0

Entering edit mode

3.5 years ago

Gonçalo • 0

Hello everyone, I have to write a function that takes a sequence of nucleotides and outputs a dictionary with the translation in all possible reading frames. The keys of the dic should be x1, x2, x3 for the forward frames and y1,y2,y3 for the reverse reading frames and the value of each key the translation of the sequence corresponding to the reading frame. I do not need to complement the the sequence when computing the reverse reading frame, just reverse it, and use a * to represent stop condone. I think I am on the right path but I have been trying to create a loop to go through each key of the dictionary but I'm struggling a lot. Any help would be very much appreciated. I am working with a small sequence and will create the function at the end.

This is what I have got:

sequence = "ATGACAGTAGACAGATAGGGGACAGT"
position = 0    
protein= ""    
dictionary= {}    
gencode = {
          'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
          'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
          'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
          'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
          'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
          'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
          'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
          'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
          'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
         'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
         'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
         'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
         'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
         'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
         'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*',
         'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W'}

ist_dna = list(sequence)    

x1 = "".join(list_dna)
x2 = "".join(list_dna[1:])
x3 = "".join(list_dna[2:])

list_dna.reverse()

y1 = "".join(list_dna)

y2 = "".join(list_dna[1:])

y3 = "".join(list_dna[2:])

dictionary[x1]=""

dictionary[x2]=""

dictionary[x3]=""

dictionary[y1]=""

dictionary[y2]=""

dictionary[y3]=""


while position +3<=len(sequence):  #not sure to proceed from here

translation python • 1.3k views

ADD COMMENT • link 3.5 years ago by Gonçalo • 0

0

Entering edit mode

I think you may be overthinking/complicating the task, but it appears this is an assignment so I'm afraid we can't give fully functional code ;)

You need to approach the problem in logical steps, which will broadly look like:

Iterate over a string, capturing 3 characters at once, with an offset.

This might look something like for i in range(0, len(seq), 3): ...
Capture the codons: x1 = seq[i:i+3], x2 = seq[i+1:i+4]...etc
Use the codons to look up the translation (gencode[x1])

You will need to do some extra fiddling to deal with stop codons and what happens when you reach the end of the sequence and it isn't an exact multiple of 3.

It's worth mentioning too that this is quite a common assignment/challenge so you should find no shortage of solutions on here or stackoverflow.

ADD REPLY • link 3.5 years ago by Joe 21k

0

Entering edit mode

That makes sense thank you very much.

ADD REPLY • link 3.5 years ago by Gonçalo • 0

0

Entering edit mode

Only add answers when you're answering the principal question. Otherwise, use Add Comment or Add Reply as appropriate.

ADD REPLY • link 3.5 years ago by Ram 43k