Question: proteome download, human gut microbes
gravatar for Nitha
2.2 years ago by
Nitha10 wrote:

Hi All,

I have more than 250 number of human gut microbes name with their Taxonomy Id eg "Bacteroides stercoris ATCC 43183", downloading each of their whole protein FASTA files takes a lot of times. I used Ensembl Bacteria to download the particular bacteria's protein sequence, but it taking lots of time. Can anyone help me give some grep command line , or wget, or perl program to download the protein directly.

Eg: if we have accession number id of a gene, we can use it in Batch entrez to download n number fasta sequence for the list of IDs.


sequence • 604 views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Nitha10

Are you interested in specific genes or entire gene complement of the genomes? If you know how to download the genomes files (search for threads here) then there is not much you can do about the time part. Depending on where you are in the world perhaps that is the best connect speed you are going to get.

If you have accession numbers and access to NCBI blast indexes you can use blastdbcmd utility from blast+ package to extract those sequences quickly like so: blastsbcmd -db /path_to/nr -entry_batch accession_number_file -out '%f' -out seqience_you_need

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax60k

No, my question whether I can download "whole protein sequence" in FASTA format of particular bacteria using their name. Like eg: using gene id or accession number we can use BATCH ENTREZ to download n number of gene sequence or protein sequence.

ADD REPLYlink written 2.2 years ago by Nitha10

You can use eutilities from NCBI. Take a look at the help doc here.

ADD REPLYlink written 2.2 years ago by genomax60k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1342 users visited in the last hour