Swiss-Prot human count from the archives
4.8 years ago
cdsouthan ★ 1.9k

Human Swiss-Prot Latest release 2017_01 = 20,171

I'd like a retro-backfill for 10 years or so, but only need maybe 2 releases per year for a nice chart I'm guessing this is not just available from an interface query box

If anyone can supply this - they are welcome to an aknowledgment in the protein number review I am just writing

(geting same data from neXtprot is fine also)

uniprot Swiss-Prot human proteins • 1.1k views
Why not email their support and ask (help at uniprot.org)? They probably have this information available.

Old UniProt releases are here (starting from release 1.0)

EDIT: See my answer below for links to the stats.

I have a second question but closely related to this one. The plot of total Swiss-Prot (or human) vs year shows almost an asymptote after about 2009. What is the cause of this?

The effect you are seeing from 2009 onward - a slowing of the rate of growth of the number of reviewed UniProtKB/Swiss-Prot entries - is due to a deliberate change in our curation policies.

Prior to 2009, we were using the HAMAP system for the rapid annotation by homology of uncharacterized protein sequences in UniProtKB/TrEMBL. HAMAP uses a rule-based system that leverages experimentally characterized templates from UniProtKB/Swiss-Prot (see our paper for more details for information). UniProtKB/TrEMBL entries annotated by HAMAP were subject to spot checks and subsequently integrated into UniProtKB/Swiss-Prot.

From 2009 the pipeline for UniProtKB/TrEMBL annotation was modified to include HAMAP data, and since that time all entries annotated by HAMAP have been made available as part of UniProtKB/TrEMBL without any further review or checks). HAMAP mimics many of the checks performed by Swiss-Prot curators, and providing the HAMAP data in UniProtKB/TrEMBL without further review allows curators to concentrate on the curation of experimental data from the literature. This has allowed us to develop our curation workflows in other ways; 2009 also marked the year in which the Swiss-Prot group began to systematically curate Gene Ontology terms to all proteins, and now contributes some 25,000 annotations per year to the GO.

Thanks, so to paraphrase, HAMAP makes a big improvement to TrEMBL allowing Swiss-Prot to increase the curation-time-per record > apparent slow down but quality/utility <

If you are referring to human entries, I guess no new proteins have been added since 2009. Or the amount of proteins removed / added is in equilibrium.

If you refer to the total number of entries (any species) in Swiss-Prot, I doubt it is asymptotic since 2009. I would expect it always be growing and growing. Curators haven't stop working

I have checked on: http://web.expasy.org/docs/relnotes/relstat.html and there was a slow down in the incorporation of entries into Swiss-Prot since 2009. I do not know why. They may have be decided to limit the incorporation of more whole-proteomes. Depending on what you want it might be better to look at the entire Uniprot database, not restricting to "curated" entries in UniprotKB (Swiss-prot).

Old release notes have protein numbers by species. They can be found here.

EDIT: All stats can be found here.

abascalfederico ★ 1.2k

You can read the "DT" lines from the latest Uniprot release to analyse the contents of the database along time:

DT   21-JUL-1986, integrated into UniProtKB/Swiss-Prot.
DT   21-JUL-1986, sequence version 1.
DT   18-JAN-2017, entry version 206.

Seems to work, just the job, thanks. Actually obvious but you thought of it.

n.b. graph is now on twitter

