Programatically Downloading Complete Animal genomes - Entrez Utils
0
0
Entering edit mode
9.2 years ago
moranr ▴ 290

Hi,

My goal is to download all the complete nucleotide genome for metazoans.

I can about half of these very easily from Ensembl Metazoa. However, for the rest of the species I am thinking I need to use Entrez Utilities on NCBI with python.

My problem is selecting only completed genomes. Even if it is a case where all assemblies are downloaded for each species - that would be ok. I want a single fasta/gb file for a genome/assembly.

At the moment I am:

#Search Entrez and get ID for each species

with open('SpeciesList.csv', 'rU') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for sp in reader:
        search_term = str(sp[0])+'[orgn] complete genome[title]NOT mitochondria[title]'
        handle = Entrez.esearch(db='genome', term=search_term)
        genome_ids = Entrez.read(handle)['IdList']

##get gb files using ids

for genome_id in genome_ids:
    record = Entrez.efetch(db="nucleotide", id=genome_id, rettype="gb", retmode="text")
    filename = 'genBankRecord_{}.gb'.format(genome_id)
    print('Writing:{}'.format(filename))
    with open(filename, 'w') as f:
        f.write(record.read())

##Parse gb files

My problem is only grabbing gb files for completed genomes. Can anyone help with my search query here please?

sequence Entrez genome Python • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6