I am writing some unit test and encountered a odd behavior from SeqIO in biopython. I have reproduced the problem in the following script
import mock from Bio import SeqIO FAKE_FASTA_CONTENT = '''>1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >2 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >3 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ''' # https://docs.python.org/3.3/library/unittest.mock.html m = mock.mock_open(read_data=FAKE_FASTA_CONTENT) with mock.patch('__main__.open', m, create=True): with open('foo') as inf: io = SeqIO.parse(inf, 'fasta') for k, rec in enumerate(io): print(rec)
The output is
ID: 1 Name: 1 Description: 1 Number of features: 0 Seq('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...AAA', SingleLetterAlphabet()) ID: 2 Name: 2 Description: 2 Number of features: 0 Seq('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA...AAA', SingleLetterAlphabet())
I wonder why it cannot find the third (also last) sequence from the input.
pasting FAKE_FASTA_CONTENT into a text file does work, but what's going on in this testcase? How does SeqIO do parsing?