Mapping Uniprot Proteome identifiers to a taxonomic tree
1
2
Entering edit mode
8.5 years ago
Patrick ▴ 20

I'm trying to map the presence of three different proteins over the bacterial kingdom. First I thought of using pfam for this purpose, but two proteins end up in very messy DUFs (Domein of Unknown Function). So I now ran three BLAST queries on the Representative Proteomes (http://pir.georgetown.edu/rps/) and put an e-value threshold of 10-5 in place.

This results for each protein in a BLAST table that contains both an UniProt Proteome Identifier and a NCBI Taxonomy identifiers.

Now I want to map back these BLAST results to either a Uniprot taxonomy or NCBI taxonomy. For the NCBI taxonomy there is a tool available http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi however does a similar taxonomy tool exist for the Uniprot as well? Preferably I would supply it with a list of Uniprot Proteome identifiers and it would return a tree (in for example) phylip format. This could then be visualized using iTOL. I found a neat translation between Uniprot and the NCBI taxonomy (http://www.uniprot.org/docs/speclist.txt) but I was wondering if a Uniprot taxonomy exist as well?

tree taxonomy • 3.5k views
ADD COMMENT
0
Entering edit mode

I suggest to add the tag "uniprot" in order to make this thread more easily findable.

ADD REPLY
1
Entering edit mode
8.4 years ago

In reply to your last question:

The taxonomy database that is maintained by the UniProt group is based on the NCBI taxonomy database, which is supplemented with data specific to the UniProt Knowledgebase (UniProtKB). While the NCBI taxonomy is updated daily to be in sync with GenBank/EMBL-Bank/DDBJ, the UniProt taxonomy is updated only at UniProt releases to be in sync with UniProtKB. It may therefore happen that for the time period of a UniProt release, you can find new taxa at the NCBI that are not yet in UniProt (and vice versa for deleted taxa).

For more details see http://www.uniprot.org/help/taxonomy

Regarding a mapping of proteome identifiers to tax_ids, you can query the "Proteomes" section of the UniProt website and then download the results in tab-delimited format, e.g. for non-redundant bacterial proteomes (limited to the first 10 hits):

http://www.uniprot.org/proteomes/?sort=&desc=&compress=no&query=taxonomy:%22Bacteria%20[2]%22%20redundant:no&fil=&limit=10&force=no&preview=true&format=tab&columns=id,organism-id

Proteome ID Organism ID
UP000000579 71421
UP000000558 83334
UP000000625 83333
UP000002524 243230
UP000000798 224324
UP000012042 1001583
UP000001258 272558
UP000001807 224326
UP000001570 224308
UP000002519 83334
ADD COMMENT

Login before adding your answer.

Traffic: 2417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6