I am trying to fetch genbank files from a list of given accession ids, which are stored in a file, by using biopython. This is how I do it so far:
#!/usr/bin/env python
from sys import argv, stdout, exit
from Bio import SeqIO
from Bio import Entrez
Entrez.email='example@mail.com'
def searchInDb(searchFor):
handle = Entrez.efetch(db='nucleotide', id=searchFor, rettype='gb')
link = searchFor + ".gb"
local_file = open(link, 'w')
local_file.write(handle.read())
handle.close()
local_file.close()
if __name__ == '__main__':
if len(argv) != 2:
print '\tmissing file link'
exit(1)
name = argv[1]
with open(name, "r") as ins:
for line in ins:
ID = line.rstrip('\n')
print "Getting gb file for ", ID
searchInDb(ID)
However when I do it like this and later take a look at the .gb file, it is not complete, I dont have any information about the CDS or anything, but I need exactly those because later I want to parse from the gb file the gene_locus_tags as well as the position of the CDS on the genome and so on.
Does someone know how do I need to change my code so I achieve getting the complete .gb file??