Question: Python- Find ORF in sequence, compound return statement
1
gravatar for st.ph.n
4.5 years ago by
st.ph.n2.4k
Philadelphia, PA
st.ph.n2.4k wrote:

HI all. I'm working on Python script that find ORF for a sequence, within a function of code. I hope this is well explained enough to follow:

It works so far if there is a start and stop codon found, or just stop. However if there is a start but no stop codon I run into a problem with one of my sequences, the return from the function is "None".

Here's what I have thus far (might be indent issues from pasting):

def find_orf(sequence, gb):
        start_pos = sequence.find('GCCGCCACCATG')
        print "START " + str(start_pos)
        if start_pos >= 0:
                s_to_ATG = int(start_pos) + 9
                start = sequence[s_to_ATG:]
                for i in xrange(0, len(start), 3):
                        stops =["TAA", "TGA", "TAG"]
                        codon = start[i:i+3]
                        if codon in stops:
                                orf = start[:i+3]
                        else:
                                orf = start
                return orf, start_pos
        elif start_pos < 0:
                stop_pos = sequence.find(str(gb[-12:]))
                begin_to_stop = int(stop_pos) + 12
                return sequence[:begin_to_stop], start_pos

        else:
                print "Error: There is no open-reading frame for this sequence!"

I've highlighted what is giving me issues in the code. It seems to always go to the "else" statement. The first sequence I have, I know for sure it has a START and STOP. The second sequence does not have a STOP, but has a START. So, I want it to print START to the end of the sequence, which in the code is the variable "start".

If I change this IF statement in the code to this:

for i in xrange(0, len(start), 3):
                        stops =["TAA", "TGA", "TAG"]
                        codon = start[i:i+3]
                        if codon in stops: return start[:i+3], start_pos

it prints the first sequence start to stop correctly, and the second is return as "Nonetype", because it has no STOP.  It may be something simple that I'm overlooking, but I was hoping someone could see something wrong with the first code example, so that if there is a start and stop, it will print it correctly, and if there is no stop will print start to the end. (The second part of the code works where if there is no start, it prints from the beginning to the stop)

All help is appreciated.

 

orf sequence codon python • 7.7k views
ADD COMMENTlink modified 4.5 years ago by Devon Ryan89k • written 4.5 years ago by st.ph.n2.4k
1

Quick question, are you sure the if codon in stops is not performing a case sensitive search (just in case the underlying sequence is in lower case)?

ADD REPLYlink written 4.5 years ago by RamRS21k

@ RamRS: The sequences are both uppercase. But for safety purposes I can put something in to ensure it's looking for both.

ADD REPLYlink written 4.5 years ago by st.ph.n2.4k

A couple of trial-and-error suggestions again:

a. Let's push the stop = array statement out of the loop (just being pedantic here)

b. Let's try range() instead of xrange()

If the above result in no difference, I'd love to have a sample sequence that I can use to test locally please. Thank you!

ADD REPLYlink written 4.5 years ago by RamRS21k

@RamRS, I've tried without stop = to "if codon in ["TAA", "TGA", "TAG"]:" -- Same result. Also tried range instead of xrange(). Also, tried two returns in the if/else, even though I know that's not kosher.

ADD REPLYlink written 4.5 years ago by st.ph.n2.4k

Devon's solution worked, right? I think I'm running low on coffee - should've spotted that right away. Sorry I led you on a merry wild goose chase!

ADD REPLYlink written 4.5 years ago by RamRS21k
1

Yes. It worked. Gave an upvote, and accepted the answer. When I originally wrote the script I had a break there when it was only looking for START-STOP. Then I realized I wont always find either one or the other. So I added the If/else for start_pos > 0, and then had to figure out what to do if there was no stop but a start, and somewhere in my edits, I lost the break. (Probably thinking it would finish there without catching the STOP codon)

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by st.ph.n2.4k
1

Have you tried just using biopython? The link even describes ORF finding.

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

@Devon Ryan, I've considered it. However, I'd rather use this as a learning curve, and develop it in pure code.

ADD REPLYlink written 4.5 years ago by st.ph.n2.4k

Can't argue with that!

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

Do you see any reason why the first code example goes to the else statement?

ADD REPLYlink written 4.5 years ago by st.ph.n2.4k
1

It doesn't always, you just need to add a break in the if before it :)

Edit: I just posted an answer with the inserted edit in case the above didn't suffice.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Devon Ryan89k
3
gravatar for Devon Ryan
4.5 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

You just need to add a break:

def find_orf(sequence, gb):
        start_pos = sequence.find('GCCGCCACCATG')
        print "START " + str(start_pos)
        if start_pos >= 0:
                s_to_ATG = int(start_pos) + 9
                start = sequence[s_to_ATG:]
                for i in xrange(0, len(start), 3):
                        stops =["TAA", "TGA", "TAG"]
                        codon = start[i:i+3]
                        if codon in stops:
                                orf = start[:i+3]
                                break
                        else:
                                orf = start
                return orf, start_pos
        elif start_pos < 0:
                stop_pos = sequence.find(str(gb[-12:]))
                begin_to_stop = int(stop_pos) + 12
                return sequence[:begin_to_stop], start_pos

        else:
                print "Error: There is no open-reading frame for this sequence!"

Without the break, you'll always iterate to the end of the sequence.

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Devon Ryan89k

I wish I'd seen that! It would seem I'm getting out of touch :(

ADD REPLYlink written 4.5 years ago by RamRS21k
1

You just need more coffee :)

ADD REPLYlink written 4.5 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1428 users visited in the last hour