Question: How Do I Loop Over Sequences With Biopython?
1
gravatar for T
8.6 years ago by
T20
T20 wrote:

Hi,

I'm new to biopython. I can't seem to get nested for loops to iterate properly. Here's a simple example:

from Bio import SeqIO

infile = file('testseq.fna')

midfile = file('mids.fna')

c = 0


for midseq,line in SeqIO.parse(midfile,"fasta"):
    print midseq.id
    print midseq.seq
    for line in SeqIO.parse(infile,"fasta"):
        print line.seq

I have 12 simple fasta records in testseq.fna, and 96 mid identifiers in mids.fna. I should get a list of 96 mid ids and seqs, each followed by 12 testseq sequences, but what I get is just the first mid and sequences then just the other mids with no sequence... run it and you will see what I mean. I'm pulling my hair out - why doesn't Python run the 'line' loop for each 'mid' loop like it should??

Thanks for any help - I know its surprising but I couldn't find an answer to this anywhere (on python forum they were just rude!).

Theo

biopython • 4.0k views
ADD COMMENTlink modified 8.6 years ago • written 8.6 years ago by T20
4
gravatar for Istvan Albert
8.6 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

Your stream reaches the end of the file by the end of the first iteration.

Move the line:

infile = file('testseq.fna')

Inside the loop like so:

for midseq,line in SeqIO.parse(midfile,"fasta"):
    infile = file('testseq.fna')
ADD COMMENTlink modified 8.6 years ago • written 8.6 years ago by Istvan Albert ♦♦ 85k
0
gravatar for T
8.6 years ago by
T20
T20 wrote:

Brilliant! thanks. You have no idea how long that has taken to find out! I still don't know why it works though.

Is it that opening the file again for each primary loop resets the iteration? I can't find anywhere in a Python manual where it says you have to do that! If it was a list and not a file would it have to be defined in the loop like that too?

Thanks again,

Theo

ADD COMMENTlink written 8.6 years ago by T20
1

when you open a file you open a stream to it, once that runs out you would either need to go back to the beginning with a seek operation or just open the file in a new stream. Each time you open the file it is an entirely new stream to the same content - you can be in different locations of the same file if you open it in different streams.

ADD REPLYlink written 8.6 years ago by Istvan Albert ♦♦ 85k

Essentially what's going on here is that a file acts more like an iterator than like a list. Try running through an iterator (made by something like iterator = iter([1,2,3])) in a loop multiple times(like for i in range(3): for x in iterator: print x, and you'll see that it only runs through the 1,2,3 items in the first inner for loop, and acts as empty after that - unlike a list, which would act the same in every inner for loop. But you're right, the python manual isn't very explicit about that.

ADD REPLYlink written 8.6 years ago by Weronika300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1939 users visited in the last hour