HI all. I'm working on Python script that find ORF for a sequence, within a function of code. I hope this is well explained enough to follow:
It works so far if there is a start and stop codon found, or just stop. However if there is a start but no stop codon I run into a problem with one of my sequences, the return from the function is "None".
Here's what I have thus far (might be indent issues from pasting):
def find_orf(sequence, gb): start_pos = sequence.find('GCCGCCACCATG') print "START " + str(start_pos) if start_pos >= 0: s_to_ATG = int(start_pos) + 9 start = sequence[s_to_ATG:] for i in xrange(0, len(start), 3): stops =["TAA", "TGA", "TAG"] codon = start[i:i+3]
if codon in stops: orf = start[:i+3] else: orf = startreturn orf, start_pos elif start_pos < 0: stop_pos = sequence.find(str(gb[-12:])) begin_to_stop = int(stop_pos) + 12 return sequence[:begin_to_stop], start_pos else: print "Error: There is no open-reading frame for this sequence!"
I've highlighted what is giving me issues in the code. It seems to always go to the "else" statement. The first sequence I have, I know for sure it has a START and STOP. The second sequence does not have a STOP, but has a START. So, I want it to print START to the end of the sequence, which in the code is the variable "start".
If I change this IF statement in the code to this:
for i in xrange(0, len(start), 3):
stops =["TAA", "TGA", "TAG"]
codon = start[i:i+3]
if codon in stops: return start[:i+3], start_pos
it prints the first sequence start to stop correctly, and the second is return as "Nonetype", because it has no STOP. It may be something simple that I'm overlooking, but I was hoping someone could see something wrong with the first code example, so that if there is a start and stop, it will print it correctly, and if there is no stop will print start to the end. (The second part of the code works where if there is no start, it prints from the beginning to the stop)
All help is appreciated.