Question: Programatically Downloading Complete Animal genomes - Entrez Utils
0
gravatar for moranr
5.4 years ago by
moranr270
Ireland
moranr270 wrote:

Hi, 

My goal is to download all the complete nucleotide genome for metazoans. 

I can about half of these very easily from Ensembl Metazoa.  However, for the rest of the species I am thinking I need to use Entrez Utilities on NCBI with python. 

My problem is selecting only completed genomes.  Even if it is a case where all assemblies are downloaded for each species - that would be ok.  I want a single fasta/gb file for a genome/assembly.  

 

At the moment I am: 

#Search Entrez and get ID for each species

with open('SpeciesList.csv', 'rU') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for sp in reader:
        search_term = str(sp[0])+'[orgn] complete genome[title]NOT mitochondria[title]'
        handle = Entrez.esearch(db='genome', term=search_term)
        genome_ids = Entrez.read(handle)['IdList']

##get gb files using ids

for genome_id in genome_ids:
    record = Entrez.efetch(db="nucleotide", id=genome_id, rettype="gb", retmode="text")
    filename = 'genBankRecord_{}.gb'.format(genome_id)
    print('Writing:{}'.format(filename))
    with open(filename, 'w') as f:
        f.write(record.read())

##Parse gb files

 

My problem is only grabbing gb files for completed genomes.  Can anyone help with my search query here please?

entrez sequence python genome • 1.3k views
ADD COMMENTlink written 5.4 years ago by moranr270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour