How To Differentiate Files With One Record From Files With Multiple Records?
1
0
Entering edit mode
9.1 years ago

Im working with biopython, python, and gtk to create a program to load files of bioinformatic interest.

this files have multiple sequence in it

http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk

http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.fasta

but this ones only have one (long) sequence.

http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.gb

http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.fna

is there any way to know this before processing the file?, how to differentiate the ones with one sequence from others with multiple sequences. i want to know when to use exactly Bio.SeqIO.read() or, Bio.SeqIO.parse()

thanks for your time, i tried to search for answers, but i didn't find something similar to this.

python biopython file fasta genbank • 2.2k views
2
Entering edit mode

is there any way to know this before processing the file?

You'd have to process it somehow to determine whether the file contains one or multiple sequences. Given this, consider using Bio.SeqIO.parse(), since it handles both cases.

2
Entering edit mode
9.1 years ago
Peter 6.0k

If you don't know how many records there are, assume at least one, and use Bio.SeqIO.parse() with a for loop. If the file happens to have only one record, your code will just do the for loop once. Easy :)

0
Entering edit mode

Thanks, i'm testing the loading times for different files, just wanna go with the most optimized code.

0
Entering edit mode

Well internally Bio.SeqIO.read() calls Bio.SeqIO.parse() anyway, and checks there was exactly one record.