Question: Mapping Uniprot Proteome identifiers to a taxonomic tree
2
gravatar for Patrick
3.4 years ago by
Patrick20
European Union
Patrick20 wrote:

I'm trying to map the presence of three different proteins over the bacterial kingdom. First I thought of using pfam for this purpose, but two proteins end up in very messy DUFs (Domein of Unknown Function). So I now ran three BLAST queries on the Representative Proteomes (http://pir.georgetown.edu/rps/) and put an e-value threshold of 10-5 in place. 
This results for each protein in a BLAST table that contains both an UniProt Proteome Identifier and a NCBI Taxonomy identifiers. 
Now I want to map back these BLAST results to either a Uniprot taxonomy or NCBI taxonomy. For the NCBI taxonomy there is a tool available http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi however does a similar taxonomy tool exist for the Uniprot as well? Preferably I would supply it with a list of Uniprot Proteome identifiers and it would return a tree (in for example) phylip format. This could then be visualized using iTOL. I found a neat translation between Uniprot and the NCBI taxonomy (http://www.uniprot.org/docs/speclist.txt) but I was wondering if a Uniprot taxonomy exist as well?

 

 

tree taxonomy latest • 1.7k views
ADD COMMENTlink modified 3.4 years ago by Elisabeth Gasteiger1.6k • written 3.4 years ago by Patrick20

I suggest to add the tag "uniprot" in order to make this thread more easily findable.

ADD REPLYlink written 3.4 years ago by Elisabeth Gasteiger1.6k
1
gravatar for Elisabeth Gasteiger
3.4 years ago by
Geneva
Elisabeth Gasteiger1.6k wrote:

In reply to your last question:

The taxonomy database that is maintained by the UniProt group (http://www.uniprot.org/taxonomy/)  is based on the NCBI taxonomy database, which is supplemented with data specific to the UniProt Knowledgebase (UniProtKB). While the NCBI taxonomy is updated daily to be in sync with GenBank/EMBL-Bank/DDBJ, the UniProt taxonomy is updated only at UniProt releases to be in sync with UniProtKB. It may therefore happen that for the time period of a UniProt release, you can find new taxa at the NCBI that are not yet in UniProt (and vice versa for deleted taxa).

For more details see http://www.uniprot.org/help/taxonomy

 

Regarding a mapping of proteome identifiers to tax_ids, you can query the "Proteomes" section of the UniProt website and then download the results in tab-delimited format, e.g. for non-redundant bacterial proteomes (limited to the first 10 hits):

http://www.uniprot.org/proteomes/?sort=&desc=&compress=no&query=taxonomy:%22Bacteria%20[2]%22%20redundant:no&fil=&limit=10&force=no&preview=true&format=tab&columns=id,organism-id

Proteome ID Organism ID
UP000000579 71421
UP000000558 83334
UP000000625 83333
UP000002524 243230
UP000000798 224324
UP000012042 1001583
UP000001258 272558
UP000001807 224326
UP000001570 224308
UP000002519 83334

ADD COMMENTlink written 3.4 years ago by Elisabeth Gasteiger1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1371 users visited in the last hour