Get species/genus names from NCBI nr protein accession IDs for phylogenetic tree annotation?
0
0
Entering edit mode
3.3 years ago
izhang • 0

I have a list of protein accession IDs from the NCBI nr database that look like this:

WP_0445013 WP_1884344 TBR13838

These are all bacterial proteins from a range of different bacteria, and I've made a phylogenetic tree based on these proteins. However, the tree annotations are these labels and I want to annotate it with the taxonomy instead. I'm not very familiar with the Entrez system but is there an easy way to replace these accession IDs with the taxonomy of the sequence, such as genus and species names?

Any help is appreciated, thanks!

NCBI sequence • 1.0k views
ADD COMMENT
2
Entering edit mode

You could do something like following using EntrezDirect:

$ esearch -db protein -query "WP_000445013.1" | efetch -format docsum | xtract -pattern DocumentSummary -element Organism
Escherichia coli

though the examples numbers you posted don't seem to be correct. WP accessions refer to multiple organisms so keep that in mind.

ADD REPLY
0
Entering edit mode

Thank you, that works! It seems like my Phylip conversion program truncated some of the accession numbers. I retrieved these proteins from NCBI nr, but is there a place I can download the entire set of complete, annotated bacterial genomes? I'm trying to look at the evolution of a widespread metabolic pathway across all/as many bacteria as possible.

ADD REPLY

Login before adding your answer.

Traffic: 1701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6