Question: Reading frames in python
0
gravatar for Ionic_Bond
5 weeks ago by
Ionic_Bond0
London
Ionic_Bond0 wrote:

Hello everyone, I have to write a function that takes a sequence of nucleotides and outputs a dictionary with the translation in all possible reading frames. The keys of the dic should be x1, x2, x3 for the forward frames and y1,y2,y3 for the reverse reading frames and the value of each key the translation of the sequence corresponding to the reading frame. I do not need to complement the the sequence when computing the reverse reading frame, just reverse it, and use a * to represent stop condone. I think I am on the right path but I have been trying to create a loop to go through each key of the dictionary but I'm struggling a lot. Any help would be very much appreciated. I am working with a small sequence and will create the function at the end.

This is what I have got:

sequence = "ATGACAGTAGACAGATAGGGGACAGT"
position = 0    
protein= ""    
dictionary= {}    
gencode = {
          'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
          'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
          'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
          'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
          'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
          'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
          'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
          'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
          'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
         'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
         'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
         'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
         'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
         'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
         'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*',
         'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W'}

ist_dna = list(sequence)    

x1 = "".join(list_dna)
x2 = "".join(list_dna[1:])
x3 = "".join(list_dna[2:])

list_dna.reverse()

y1 = "".join(list_dna)

y2 = "".join(list_dna[1:])

y3 = "".join(list_dna[2:])

dictionary[x1]=""

dictionary[x2]=""

dictionary[x3]=""

dictionary[y1]=""

dictionary[y2]=""

dictionary[y3]=""


while position +3<=len(sequence):  #not sure to proceed from here
translation python • 140 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Ionic_Bond0

I think you may be overthinking/complicating the task, but it appears this is an assignment so I'm afraid we can't give fully functional code ;)

You need to approach the problem in logical steps, which will broadly look like:

Iterate over a string, capturing 3 characters at once, with an offset.

  • This might look something like for i in range(0, len(seq), 3): ...
  • Capture the codons: x1 = seq[i:i+3], x2 = seq[i+1:i+4]...etc
  • Use the codons to look up the translation (gencode[x1])

You will need to do some extra fiddling to deal with stop codons and what happens when you reach the end of the sequence and it isn't an exact multiple of 3.

It's worth mentioning too that this is quite a common assignment/challenge so you should find no shortage of solutions on here or stackoverflow.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Joe18k

That makes sense thank you very much.

ADD REPLYlink written 5 weeks ago by Ionic_Bond0

Only add answers when you're answering the principal question. Otherwise, use Add Comment or Add Reply as appropriate.

ADD REPLYlink written 5 weeks ago by _r_am31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour