Question: proteome download, human gut microbes
gravatar for Nitha
23 months ago by
Nitha10 wrote:

Hi All,

I have more than 250 number of human gut microbes name with their Taxonomy Id eg "Bacteroides stercoris ATCC 43183", downloading each of their whole protein FASTA files takes a lot of times. I used Ensembl Bacteria to download the particular bacteria's protein sequence, but it taking lots of time. Can anyone help me give some grep command line , or wget, or perl program to download the protein directly.

Eg: if we have accession number id of a gene, we can use it in Batch entrez to download n number fasta sequence for the list of IDs.


sequence • 545 views
ADD COMMENTlink modified 23 months ago • written 23 months ago by Nitha10

Are you interested in specific genes or entire gene complement of the genomes? If you know how to download the genomes files (search for threads here) then there is not much you can do about the time part. Depending on where you are in the world perhaps that is the best connect speed you are going to get.

If you have accession numbers and access to NCBI blast indexes you can use blastdbcmd utility from blast+ package to extract those sequences quickly like so: blastsbcmd -db /path_to/nr -entry_batch accession_number_file -out '%f' -out seqience_you_need

ADD REPLYlink modified 23 months ago • written 23 months ago by genomax57k

No, my question whether I can download "whole protein sequence" in FASTA format of particular bacteria using their name. Like eg: using gene id or accession number we can use BATCH ENTREZ to download n number of gene sequence or protein sequence.

ADD REPLYlink written 23 months ago by Nitha10

You can use eutilities from NCBI. Take a look at the help doc here.

ADD REPLYlink written 23 months ago by genomax57k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1051 users visited in the last hour