Downloading Gene Annotation protein.faa files from Genome Accession numbers from NCBI via Entrez
1
1
Entering edit mode
8.0 years ago
ijc2 ▴ 10

I have a list of NCBI genome accession numbers of the form: NC_####### and I want to download the protein fasta files corresponding to the genome annotations of the accession numbers.

I have tried (using Python 2.7):

import os
from Bio import Entrez, SeqIO
Entrez.email = "email@example.com"
id_list = "NC_004757"
handle = Entrez.esearch(db="nuccore", term = id_list)
record = Entrez.read(handle)
gi_list = record["IdList"]
gi_str = ",".join(gi_list)    
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="fasta_cds_aa")
records = list(SeqIO.parse(handle, "fasta"))
for item in records:
  printitem.id)

But the runtime is so long I believe there must be an issue. Any idea on how I can access these genome annotation fasta files in bulk?

entrez ncbi fasta python • 3.1k views
ADD COMMENT
0
Entering edit mode
8.0 years ago
natasha.sernova ★ 4.0k

Try to start without python to make sure everything can be found where it should be found.

See my answer inside this post.

A: where can I get environmental bacteria genome in fasta format (as many as possib

It's Ok for any organism in NCBI, not only for bacteria.

If NC_004757 is a real number, it's a bacterium, so no problem.

NCBI has been changed a lot, so make sure your files exist where you are looking for them.

Find the name of your bacterium in this file:

ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/assembly_summary_genbank.txt

Copy the respective url to any browser.

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000009145.1_ASM914v1/

You can download your faa-files from the site above.

ADD COMMENT

Login before adding your answer.

Traffic: 2750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6