Biopython SeqIO.parse not functioning for all entries in a seq
2.0 years ago
mvhanson

I'm working with some genbank seq files and have the following code:

for seq_record in SeqIO.parse("datafile_location, "genbank"):

And while it can run through most of the seqs in the seq file (which contains multiple seqs) I get the following error. Any thoughts about how to fix this?

Maybe delete the offending seq? It gets to record 92126 of 93145 and then throws the error.

I have tried re-downloading the seq file, but that doesn't fix the problem.

File "C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 516, in parse_records record = self.parse(handle, do_features) File

C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 499, in parse if self.feed(handle, consumer, do_features): File

"C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 466, in feed self._feed_header_lines(consumer, self.parse_header()) File

"C:\python38\lib\site-packages\Bio\GenBank\Scanner.py", line 1801, in feed_header_lines previous_value_line = structured_comment_dict[ KeyError: 'Assembly-Data'

SeqIO biopython SeqIO.parse genbank files • 871 views
Try to install Biopython from the github source. It seems this issue is solved there, see the discussion here.

Can you post your code here? There is a " missing in the for seq_record in SeqIO.parse("datafile_location, "genbank"): loop but I guess that's a typo in the post as opposed to the code?

