Hello everyone, I'm looking for a bash code in order to download from uniprot proteoms all the protein fasta sequences from Bacteria and protits proteoms, does someone know how I can do it please?
Hello everyone, I'm looking for a bash code in order to download from uniprot proteoms all the protein fasta sequences from Bacteria and protits proteoms, does someone know how I can do it please?
This help page on the UniProt website https://www.uniprot.org/help/api_downloading includes a code example to "Download the UniProt reference proteomes for all organisms below a given taxonomy node in compressed FASTA format"
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Not protists but you can download bacterial sequences from this page. Whole genome proteomes for Bacteria are here.
Hello, I downloaded the file : https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_bacteria.dat.gz and transform the .dat into .fasta using the python function Bio.SwissProt but I only get 335 066 fasta bacterial sequence despti the fact that when I type on uniprot :
taxonomy:bacteria
in the research tab I up to 151,792,141 bacterial sequence. Do you know why?You have a better solution provided by Elisabeth Gasteiger below.
You can use
seqret
from EMBOSS to convert thedat
files to fasta. I am not sure why you get a smaller number of entries. Perhaps redundant sequences are represented only once.Ok I see, in fact I only download the swissprot part and not the Trembl part, I will check if the number of entries is good from that.