8.9 years ago
kevinjspring ▴ 20

I have been trying to use Biopython to parse out certain domains from proteins and it was suggested to use the Bio.SwissProt module. Unfortunately, I don't see any SwissProt data files available on UniProt. The only available file formats are GGF, FASTA, XML, and TXT. Anyone know how I can get access to the Swiss-Prot file format?

TXT is what you want.

I did a slight work around where I pulled the accession numbers from UniProt and then used the Biopython module ExPASy to pull the data.

8.9 years ago
Hamish ★ 3.2k

The "text" files (also known as 'dat' files) are the files in UniProtKB/SwissProt format, so you can fetch these with:

wget 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_*.dat.gz'


or using one of the many mirrors:

wget 'ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_*.dat.gz'


Note: the UniProtKB/TrEMBL file is large (approx. 20GB compressed and about 110GB uncompressed) so you will likely only want to download this if you need to. See Why is UniProtKB composed of 2 sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL? for an overview of the differences between UniProtKB/SwissProt and UniProtKB/TrEMBL.

If you need the whole database fetches like the above are recommended.

UniProt also provide subsets of the database based on:

Which may be more appropriate if you are only interested in certain organisms.

For specific entries, where you already have a list of identifiers or accessions, the various web services providing access to the UniProtKB data are more appropriate. For example:

8.9 years ago
Prakki Rama ★ 2.6k

Hi, is this what you are looking for?

No, it seems like I need to use the ExPASy module to pull the records from the ExPASy database http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc136