Question: What Is The Species Distribution Of Uniprot
gravatar for mtyler.jason
6.4 years ago by
mtyler.jason110 wrote:

I want to know about the species distribution of uniprot. How many human proteins does uni prot have. How many from other species. Is there any way to know about this information about the whole uniprot protein database?

protein uniprot • 2.3k views
ADD COMMENTlink modified 6.4 years ago by Jerven650 • written 6.4 years ago by mtyler.jason110
gravatar for Pierre Lindenbaum
6.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:
$ curl -s "" |\
gunzip -c | grep -E '^OS ' | cut -c6- | sort | uniq -c | sort -n 

   4127 Dictyostelium discoideum (Slime mold).
   4185 Bacillus subtilis (strain 168).
   4431 Escherichia coli (strain K12).
   5097 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast).
   5983 Bos taurus (Bovine).
   6621 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast).
   7875 Rattus norvegicus (Rat).
  12545 Arabidopsis thaliana (Mouse-ear cress).
  16642 Mus musculus (Mouse).
  20273 Homo sapiens (Human).
ADD COMMENTlink written 6.4 years ago by Pierre Lindenbaum128k
gravatar for Hamish
6.4 years ago by
Hamish3.1k wrote:

A place to start is the UniProt statistics pages:

These include details of the taxonomic distribution of the current UniProtKB entries.

ADD COMMENTlink written 6.4 years ago by Hamish3.1k
gravatar for Jerven
6.4 years ago by
Jerven650 wrote:

UniProt browse by taxonomy is a way to explore the taxonomic distribution for all of UniProtKB. However, as UniProt uses the NCBI taxonomy there are things in there that can surprise the unaware biologist. For example. Homo sapiens, has two subspecies neanderthalensis and ssp. Denisova (don't ask me why, it just is... ). An other is that up to now there was basically a 1 to 1 taxid to genome project for bacterial species/strains/subspecies. Which is going to change soon.

ADD COMMENTlink written 6.4 years ago by Jerven650

UniProt uses a modified version of the NCBI Taxonomy (see UniProt Taxonomy) which:

  • Uses an alternative authority for some taxa. Thus different scientific and common names are used for those taxa in UniProt, the names used in NCBI Taxonomy (and thus in INSDC) are handled as synonyms.
  • Additional taxa. Since UniProt receives submissions of protein sequences, it sometimes has to provide a taxonomy node before the organism is available in the NCBI Taxonomy.

The taxonomy identifiers (e.g. 9606 for Homo sapiens) should be consistent between the two taxonomies so mapping between them should be simple.

The handling of archaeological taxa is always a matter of conjecture, since any classification is based on limited information and are subject to change as more examples are discovered and examined. The case of early humans it is unclear what the evolutionary relationships are since few examples are known (see Homo (genus))). For the moment NCBI Taxonomy has placed Denisova and Neanderthal as subspecies, presumably because this makes certain types of searches and analysis easier (e.g. using Homo sapiens to provide a reference genome), as the sequence data for these species improves this positioning will likely change to incorporate the new information.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Hamish3.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1313 users visited in the last hour