Hello everyone, I'm looking for a bash code in order to download from uniprot proteoms all the protein fasta sequences from Bacteria and protits proteoms, does someone know how I can do it please?
Not protists but you can download bacterial sequences from this page. Whole genome proteomes for Bacteria are here.
Hello, I downloaded the file : https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_bacteria.dat.gz and transform the .dat into .fasta using the python function Bio.SwissProt but I only get 335 066 fasta bacterial sequence despti the fact that when I type on uniprot : taxonomy:bacteria in the research tab I up to 151,792,141 bacterial sequence. Do you know why?
You have a better solution provided by Elisabeth Gasteiger below.
You can use seqret from EMBOSS to convert the dat files to fasta. I am not sure why you get a smaller number of entries. Perhaps redundant sequences are represented only once.
Ok I see, in fact I only download the swissprot part and not the Trembl part, I will check if the number of entries is good from that.
This help page on the UniProt website https://www.uniprot.org/help/api_downloading includes a code example to "Download the UniProt reference proteomes for all organisms below a given taxonomy node in compressed FASTA format"
How fine, I'll try that one thanks
Hi, I used this technique but at the end I only found 1,335,574 fasta sequences instead of 151,792,141, any idea ?
I use the following command : perl perl_test.pl 2 (where perl_test.pl is the code in Uniprot webpage)
perl perl_test.pl 2
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy