fetch -complete- genbank file using biopython
1
1
Entering edit mode
4.7 years ago

I am trying to fetch genbank files from a list of given accession ids, which are stored in a file, by using biopython. This is how I do it so far:

#!/usr/bin/env python

from sys import argv, stdout, exit
from Bio import SeqIO
from Bio import Entrez

Entrez.email='example@mail.com'

def searchInDb(searchFor):

handle = Entrez.efetch(db='nucleotide', id=searchFor, rettype='gb')

handle.close()
local_file.close()

if __name__ == '__main__':
if len(argv) != 2:
exit(1)
name = argv[1]

with open(name, "r") as ins:
for line in ins:
ID = line.rstrip('\n')
print "Getting gb file for ", ID
searchInDb(ID)


However when I do it like this and later take a look at the .gb file, it is not complete, I dont have any information about the CDS or anything, but I need exactly those because later I want to parse from the gb file the gene_locus_tags as well as the position of the CDS on the genome and so on.

Does someone know how do I need to change my code so I achieve getting the complete .gb file??

genome • 3.7k views
1
Entering edit mode
4.7 years ago

it is not complete, I dont have any information about the CDS or anything,

Give us some examples of accession numbers. Furthermore, not all sequences have those informations.

0
Entering edit mode

Yes you are right. But when I manually download the gb files for my accessions, I have the complete file, so that is why I guessed my code is wrong. Taking for example this one: NC_021485, with my code the .gb file is not complete

1
Entering edit mode

use rettype=gbwithparts

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_021485&retmode=xml&rettype=gbwithparts

however, I'ts genbank/text don't know how to retrieve the XML output.

0
Entering edit mode

Yes, I tried it, and it works so far. thanks.