Question

Download all Becteria and protist fasta protein sequences from UNIPROT proteomes

1

Entering edit mode

3.3 years ago

Chvatil ▴ 130

Hello everyone, I'm looking for a bash code in order to download from uniprot proteoms all the protein fasta sequences from Bacteria and protits proteoms, does someone know how I can do it please?

uniprot fetch fasta proteome bash • 2.4k views

ADD COMMENT • link 3.3 years ago by Chvatil ▴ 130

1

Entering edit mode

Not protists but you can download bacterial sequences from this page. Whole genome proteomes for Bacteria are here.

ADD REPLY • link 3.3 years ago by GenoMax 147k

0

Entering edit mode

Hello, I downloaded the file : https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_bacteria.dat.gz and transform the .dat into .fasta using the python function Bio.SwissProt but I only get 335 066 fasta bacterial sequence despti the fact that when I type on uniprot : taxonomy:bacteria in the research tab I up to 151,792,141 bacterial sequence. Do you know why?

ADD REPLY • link 3.3 years ago by Chvatil ▴ 130

0

Entering edit mode

You have a better solution provided by Elisabeth Gasteiger below.

You can use seqret from EMBOSS to convert the dat files to fasta. I am not sure why you get a smaller number of entries. Perhaps redundant sequences are represented only once.

ADD REPLY • link 3.3 years ago by GenoMax 147k

0

Entering edit mode

Ok I see, in fact I only download the swissprot part and not the Trembl part, I will check if the number of entries is good from that.

ADD REPLY • link 3.3 years ago by Chvatil ▴ 130

score 2 · Accepted Answer · 2021-07-28

2

Entering edit mode

3.3 years ago

Elisabeth Gasteiger ★ 2.4k

This help page on the UniProt website https://www.uniprot.org/help/api_downloading includes a code example to "Download the UniProt reference proteomes for all organisms below a given taxonomy node in compressed FASTA format"

ADD COMMENT • link 3.3 years ago by Elisabeth Gasteiger ★ 2.4k

0

Entering edit mode

How fine, I'll try that one thanks

ADD REPLY • link 3.3 years ago by Chvatil ▴ 130

0

Entering edit mode

Hi, I used this technique but at the end I only found 1,335,574 fasta sequences instead of 151,792,141, any idea ?

I use the following command : perl perl_test.pl 2 (where perl_test.pl is the code in Uniprot webpage)

ADD REPLY • link 3.3 years ago by Chvatil ▴ 130