Question: Downloading Gene Annotation protein.faa files from Genome Accession numbers from NCBI via Entrez
gravatar for ijc2
3.8 years ago by
ijc20 wrote:

I have a list of NCBI genome accession numbers of the form: NC_####### and I want to download the protein fasta files corresponding to the genome annotations of the accession numbers.

I have tried (using Python 2.7):

import os
from Bio import Entrez, SeqIO = ""
id_list = "NC_004757"
handle = Entrez.esearch(db="nuccore", term = id_list)
record =
gi_list = record["IdList"]
gi_str = ",".join(gi_list)    
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="fasta_cds_aa")
records = list(SeqIO.parse(handle, "fasta"))
for item in records:

But the runtime is so long I believe there must be an issue. Any idea on how I can access these genome annotation fasta files in bulk?

python entrez fasta ncbi • 1.6k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by ijc20
gravatar for natasha.sernova
3.8 years ago by
natasha.sernova3.7k wrote:

Try to start without python to make sure everything can be found where it should be found.

See my answer inside this post.

A: where can I get environmental bacteria genome in fasta format (as many as possib

It's Ok for any organism in NCBI, not only for bacteria.

If NC_004757 is a real number, it's a bacterium, so no problem.

NCBI has been changed a lot, so make sure your files exist where you are looking for them.

Find the name of your bacterium in this file:

Copy the respective url to any browser.

You can download your faa-files from the site above.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by natasha.sernova3.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1578 users visited in the last hour