How To Differentiate Files With One Record From Files With Multiple Records?
1
0
Entering edit mode
9.1 years ago

Im working with biopython, python, and gtk to create a program to load files of bioinformatic interest.

this files have multiple sequence in it

http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk

http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.fasta

but this ones only have one (long) sequence.

http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.gb

http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.fna

is there any way to know this before processing the file?, how to differentiate the ones with one sequence from others with multiple sequences. i want to know when to use exactly Bio.SeqIO.read() or, Bio.SeqIO.parse()

thanks for your time, i tried to search for answers, but i didn't find something similar to this.

python biopython file fasta genbank • 2.2k views
ADD COMMENT
2
Entering edit mode

is there any way to know this before processing the file?

You'd have to process it somehow to determine whether the file contains one or multiple sequences. Given this, consider using Bio.SeqIO.parse(), since it handles both cases.

ADD REPLY
2
Entering edit mode
9.1 years ago
Peter 6.0k

If you don't know how many records there are, assume at least one, and use Bio.SeqIO.parse() with a for loop. If the file happens to have only one record, your code will just do the for loop once. Easy :)

ADD COMMENT
0
Entering edit mode

Thanks, i'm testing the loading times for different files, just wanna go with the most optimized code.

ADD REPLY
0
Entering edit mode

Well internally Bio.SeqIO.read() calls Bio.SeqIO.parse() anyway, and checks there was exactly one record.

ADD REPLY

Login before adding your answer.

Traffic: 1299 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6