You could find MEGAN very useful to explore your data (but I doubt about its faculty to make donut plots, maybe in the 5th version currently in test). Its main purpose is to make taxonomic classification (using NCBI taxonomic tree by default) from blast (like) similarity search of metagenomic data output, but it has the option to just provide a seqID - taxID mapping file. Then you can play with the very convenient tree browser and collapse whatever group you want to, and make some "abundance" plots (histograms but not only) from any branch/level in the tree you select. Last but not least, you can do all that with several conditions in the same time.
Thanks for letting me know about MEGAN. However I am struggling to map the seqid to taxid to see the distribution of sequences in various taxons. I followed the instructions given here: http://ab.inf.uni-tuebingen.de/data/software/megan4/download/welcome.html However I cannot import blast output in the following format or xml format. It fails to import anything.
Counting the number of protein per species can be easy : for each protein id uniprot, request uniprot for species and store in an array whose keys are species and values are a list of uniprot id. Then at the end, for each item of the array, count the number of values.
For the second part I would recommend using BayesTraits Written by Mark Pagel and Andrew Meade. The first part is very tricky and depends on the question you are trying to answer. You can on one hand search for the homologs of the protein which is not such a simple task (An example for yeast can be found here). You can on the other hand ask how many proteins that contain the same domain are in each species, you can get this information from Pfam for example.
Thanks for letting me know about MEGAN. However I am struggling to map the seqid to taxid to see the distribution of sequences in various taxons. I followed the instructions given here: http://ab.inf.uni-tuebingen.de/data/software/megan4/download/welcome.html However I cannot import blast output in the following format or xml format. It fails to import anything.
sp|P60709|ACTBHUMAN gi|45269029|gb|AAS55927.1| 100.00 375 0 0 1 375 30 404 0.0 786 sp|P60709|ACTBHUMAN gi|4501885|ref|NP001092.1| 100.00 375 0 0 1 375 1 375 0.0 785 sp|P60709|ACTBHUMAN gi|62897409|dbj|BAD96645.1| 99.73 375 1 0 1 375 1 375 0.0 785 sp|P60709|ACTB_HUMAN gi|54696726|gb|AAV38735.1| 100.00 375 0 0 1 375 1 375 0.0 785
Does it fail because it does not understand the format or the sequence IDs?