SeqIO object get cleared away after being accessed
1
0
Entering edit mode
2.2 years ago
zincjiang • 0

I'm using Biopython to parse a fastq file, and I found that the SeqIO object get cleared away once I accessed it.

from Bio import SeqIO
record_fastqIO = SeqIO.parse('SRR835775_1.first1000.fastq','fastq')
for record in record_fastqIO:
    print(record.id)

This script works perfectly. But if I add one line to the script:

from Bio import SeqIO
record_fastqIO = SeqIO.parse('SRR835775_1.first1000.fastq','fastq')
record_dict = SeqIO.to_dict(record_fastqIO)  # this line
for record in record_fastqIO:
    print(record.id)

There will be nothing printed out, and there's no erro. Seems like the object record_fastqIO get cleared away after using SeqIO.to_dict() function.

And also in this script:

from Bio import SeqIO
record_fastqIO = SeqIO.parse('SRR835775_1.first1000.fastq','fastq')
def get_phred_range(fastqIO): # to get the max and min quality
    qual_max = []
    qual_min = []
    for record in fastqIO:
        qual_max.append(max(record._per_letter_annotations['phred_quality']))
        qual_min.append(min(record._per_letter_annotations['phred_quality']))
    phred_max = max(qual_max)
    phred_min = min(qual_min)
    return phred_max,phred_min
x,y = get_phred_range(record_fastqIO)
print('x,y:%s,%s' % (x,y))
z,w = get_phred_range(record_fastqIO)      # exactly the same as x,y
print('z,w:%s,%s' % (z,w))

this will get me:

x,y:41,2
Traceback (most recent call last):
  File "c:\Users\zincj\Desktop\Untitled-1.py", line 36, in <module>
    z,w = get_phred_range(record_fastqIO)
  File "c:\Users\zincj\Desktop\Untitled-1.py", line 12, in get_phred_range
    phred_max = max(qual_max)
ValueError: max() arg is an empty sequence

So i'm just doing the same thing twice and the first time things go smoothly. this means there's nothing wrong with my function. but the second time it produces erro.

again it seems like the SeqIO object record_fastqIO got cleared away after i called it.

Have anyone met this before? Or is there anything wrong with my script?

Biopython • 541 views
ADD COMMENT
3
Entering edit mode
2.2 years ago
liorglic ★ 1.4k

It wasn't cleared out - it's just an iterator reaching its end.
The return value of SeqIO.parse is an iterator. When running SeqIO.to_dict on this iterator, it will run all possible iterations, meaning that if you try to keep iterating over it, as you do with your for loop, you'll simply get no more results. You can also try next(record_fastqIO) and see that you get a StopIteration exception, which means this iterator reached its end. All you need to do is just run SeqIO.parse() again, and you'll get a new iterator on which you can run your for loop. However, since you already loaded all the data to RAM with SeqIO.to_dict, I don't see why this should be needed.

ADD COMMENT

Login before adding your answer.

Traffic: 1741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6