Entering edit mode
7.5 years ago
Nitha
▴
20
Hi All,
I have more than 250 number of human gut microbes name with their Taxonomy Id eg "Bacteroides stercoris ATCC 43183", downloading each of their whole protein FASTA files takes a lot of times. I used Ensembl Bacteria to download the particular bacteria's protein sequence, but it taking lots of time. Can anyone help me give some grep command line , or wget, or perl program to download the protein directly.
Eg: if we have accession number id of a gene, we can use it in Batch entrez to download n number fasta sequence for the list of IDs.
Thanks!
Are you interested in specific genes or entire gene complement of the genomes? If you know how to download the genomes files (search for threads here) then there is not much you can do about the time part. Depending on where you are in the world perhaps that is the best connect speed you are going to get.
If you have accession numbers and access to NCBI blast indexes you can use
blastdbcmd
utility from blast+ package to extract those sequences quickly like so:blastsbcmd -db /path_to/nr -entry_batch accession_number_file -out '%f' -out seqience_you_need
No, my question whether I can download "whole protein sequence" in FASTA format of particular bacteria using their name. Like eg: using gene id or accession number we can use BATCH ENTREZ to download n number of gene sequence or protein sequence.
You can use eutilities from NCBI. Take a look at the help doc here.