I have a list of 500 proteins with their sequences, gi number, organisms they belongs to, GenBank ids. I want to create a pie chart of the taxonomic phylum of these proteins. How I can I proceed?
So, far I thought to go for Id mapping using the GI/ GeneBank Ids to get the taxonomy Id's then probably I can use that taxonomy ids to get the list of phyla. But I have not seen any option in the Uniprot browser to do that ID making between gi's/GenBank ID to taxonomy ID but in NCBI taxonomy browser, there is an option to enter the list of organisms to get the taxonomy ids. But then how to get the phyla list? In the final output, I'm expecting something like below:
Phylum Counts
Proteobactia 300
Acodonacteria 100
Cyanobacteria. 100
--------------------------
Total. 500
As I have the gi number I can match my gi's with the NCBI taxonomy file of gi vs taxid. But looking up the lineage and get the phylum name for those 500 taxids in a single run is not that easy I guess. It looks easy to me when you have one single taxid.
I don't understand. Why is it not easy? What is the problem? You will have to be more specific if you want help.
Lineage always comes (to my knowledge) in the form organism, superkingdom, phylum, class, order, family, genus, species - From the NCBI EUtils service, at least. I guess the some of the last ones depends on the tax id you use for lookup. E.g. if you lookup something by its family tax id, you probably won't get genus and species.