find ORF in sequence python
0
0
Entering edit mode
7.5 years ago
elisheva ▴ 120

Hello everyone. I'm working on Python script that find ORF for a sequence. It runs well only for short sequencece. But when I tries to input much longer sequence it gives me wrong answer(according to this server: https://www.ncbi.nlm.nih.gov/orffinder/) Can anybody find what's the problem with my code?

my code:

def codons(seq):

stops = ["TTA","TGA","TAG"]    
lst1 = [] #List for the stars codons
lst2 = [] #List for the stop codons
start = 0 #The start position of the sequence.
counter = 0 #Counter for 3 optional orfs.
#initializes the lists for 3 optional orfs.
for i in range (3):
    lst1.append([])
    lst2.append([])
#Add to the lists the positions of the start and stop codons.
while (seq and counter < 3):

    for i in range(start,len(seq),3):
        codon = seq[i:i+3] #The codon is 3 nucleotides.
        #print codon+ "\t"
        if(codon == "ATG"): #If the codon is  a start codon.
            lst1[start].append(i+1) #The position of the start codon.

        if(codon in stops): #if the codon is a stop codon.
            lst2[start].append(i+1) #The position of the stop codon.


    start += 1 #promotes the starting position.
    counter += 1 #promotes the counter
print lst1
print "------------------"
print lst2
return lst1,lst2

thanks

sequence • 16k views
ADD COMMENT
0
Entering edit mode

Maybe because you miss ORFs on the reverse strand? Maybe other start codons? (GUG, UUG)

ADD REPLY
0
Entering edit mode

No, that's not the problem. I was checking the results only on the leading strand. And the start codon is only ATG.

ADD REPLY
0
Entering edit mode

It does not look like a problem, but your first stop codon is wrong.

stops = ["TTA","TGA","TAG"]

it should be TAA instead of TTA.

ADD REPLY
0
Entering edit mode

Thank you!! I'v changed it, but it still doesn't give correct results. for example: this sequence from the ncbi: https://www.ncbi.nlm.nih.gov/nuccore/58255/ returns me lists with the positions of the starts and stops codons that are different from the results on the ORF finder.

ADD REPLY
0
Entering edit mode

See this post, it may help: Python- Find ORF in sequence, compound return statement

By the way, there are several similar posts on the right panel.

ADD REPLY
0
Entering edit mode

What's the output of your script and what's the expected output? That might help to find out what your problem is.

ADD REPLY
0
Entering edit mode

Thank you for your treatment - i'v found what was the problem (in the ORFfinder the stop codon position was represent by its end) But I'v to find ORF At least 300 nucleotides in length. The the ORFfinder find 3 ORFs on the leading strand. and my script gives me somehow duplicates. Maybe the problem is at another function- I have to check it.

ADD REPLY
0
Entering edit mode

The indentation of your code is unclear in this post. Try to fix it using four spaces per level, or use a github gist

ADD REPLY

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6