Question

proteome download, human gut microbes

0

Entering edit mode

7.5 years ago

Nitha ▴ 20

Hi All,

I have more than 250 number of human gut microbes name with their Taxonomy Id eg "Bacteroides stercoris ATCC 43183", downloading each of their whole protein FASTA files takes a lot of times. I used Ensembl Bacteria to download the particular bacteria's protein sequence, but it taking lots of time. Can anyone help me give some grep command line , or wget, or perl program to download the protein directly.

Eg: if we have accession number id of a gene, we can use it in Batch entrez to download n number fasta sequence for the list of IDs.

Thanks!

sequence • 1.6k views

ADD COMMENT • link 7.5 years ago by Nitha ▴ 20

0

Entering edit mode

Are you interested in specific genes or entire gene complement of the genomes? If you know how to download the genomes files (search for threads here) then there is not much you can do about the time part. Depending on where you are in the world perhaps that is the best connect speed you are going to get.

If you have accession numbers and access to NCBI blast indexes you can use blastdbcmd utility from blast+ package to extract those sequences quickly like so: blastsbcmd -db /path_to/nr -entry_batch accession_number_file -out '%f' -out seqience_you_need

ADD REPLY • link 7.5 years ago by GenoMax 141k

0

Entering edit mode

No, my question whether I can download "whole protein sequence" in FASTA format of particular bacteria using their name. Like eg: using gene id or accession number we can use BATCH ENTREZ to download n number of gene sequence or protein sequence.