Question: find ORF in sequence python
0
gravatar for elisheva
2.6 years ago by
elisheva80
Israel
elisheva80 wrote:

Hello everyone. I'm working on Python script that find ORF for a sequence. It runs well only for short sequencece. But when I tries to input much longer sequence it gives me wrong answer(according to this server: https://www.ncbi.nlm.nih.gov/orffinder/) Can anybody find what's the problem with my code?

my code:

def codons(seq):

stops = ["TTA","TGA","TAG"]    
lst1 = [] #List for the stars codons
lst2 = [] #List for the stop codons
start = 0 #The start position of the sequence.
counter = 0 #Counter for 3 optional orfs.
#initializes the lists for 3 optional orfs.
for i in range (3):
    lst1.append([])
    lst2.append([])
#Add to the lists the positions of the start and stop codons.
while (seq and counter < 3):

    for i in range(start,len(seq),3):
        codon = seq[i:i+3] #The codon is 3 nucleotides.
        #print codon+ "\t"
        if(codon == "ATG"): #If the codon is  a start codon.
            lst1[start].append(i+1) #The position of the start codon.

        if(codon in stops): #if the codon is a stop codon.
            lst2[start].append(i+1) #The position of the stop codon.


    start += 1 #promotes the starting position.
    counter += 1 #promotes the counter
print lst1
print "------------------"
print lst2
return lst1,lst2

thanks

sequence • 5.6k views
ADD COMMENTlink modified 2.6 years ago by natasha.sernova3.5k • written 2.6 years ago by elisheva80

Maybe because you miss ORFs on the reverse strand? Maybe other start codons? (GUG, UUG)

ADD REPLYlink written 2.6 years ago by Asaf6.1k

No, that's not the problem. I was checking the results only on the leading strand. And the start codon is only ATG.

ADD REPLYlink written 2.6 years ago by elisheva80

It does not look like a problem, but your first stop codon is wrong.

stops = ["TTA","TGA","TAG"]

it should be TAA instead of TTA.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by natasha.sernova3.5k

Thank you!! I'v changed it, but it still doesn't give correct results. for example: this sequence from the ncbi: https://www.ncbi.nlm.nih.gov/nuccore/58255/ returns me lists with the positions of the starts and stops codons that are different from the results on the ORF finder.

ADD REPLYlink written 2.6 years ago by elisheva80

See this post, it may help: Python- Find ORF in sequence, compound return statement

By the way, there are several similar posts on the right panel.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by natasha.sernova3.5k

What's the output of your script and what's the expected output? That might help to find out what your problem is.

ADD REPLYlink written 2.6 years ago by cschu1811.7k

Thank you for your treatment - i'v found what was the problem (in the ORFfinder the stop codon position was represent by its end) But I'v to find ORF At least 300 nucleotides in length. The the ORFfinder find 3 ORFs on the leading strand. and my script gives me somehow duplicates. Maybe the problem is at another function- I have to check it.

ADD REPLYlink written 2.6 years ago by elisheva80

The indentation of your code is unclear in this post. Try to fix it using four spaces per level, or use a github gist

ADD REPLYlink written 2.6 years ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1893 users visited in the last hour