I am trying to split up the Synechococcus genbank files from NCBI Genbank into separate genbank files for each genome. Some of the genomes have several genbank files because they are draft assemblies. So, I import the SeqIO library from Bio, parse the conglomerated genbank files, put them into a dictionary of lists with their gb.name as the key, then iterate through the dictionary with SeqIO.write to write the genbank files. However, the output does not include the fasta nucleotide information that is usually in a FULL genbank file, just the features. I've looked through the BioPython Tutorial and previous questions on BioStars, but I am still wondering what am I doing wrong. Any help?
Original data: https://www.dropbox.com/s/kfwothubsrh5vtc/synechococcus.gb?dl=0
from collections import defaultdict from Bio import SeqIO gbDict = defaultdict(list) gbs = SeqIO.parse("synechococcus.gb", "genbank") for gb in gbs: gbDict[gb.name].append(gb) for gb in gbDict: SeqIO.write(gbDict[gb], ".".join([gb,"gb"]), "genbank")