This would also work in iTOL (http://itol.embl.de), there you can also paste a list of names and get a tree. I think you have to replace " " by "_", though.
Question: taxonomy comparison |
||
7
|
Hi all, lets say I want to know which taxonomic level groups Tribolium castaneum and Drosophila melanogaster. Insects, right? Kindof. NCBI Taxonomy gives me the full lineages:
So the highest-level taxonomic grouping between the two is Endopterygota. That's when Coleoptera and Diptera separate. Now lets say I have 10 pairs of such species and I want to see how close & distant they are... How can I do this easily? (ie without long literature searches or coding an NCBI Taxonomy parser!?) Thanks! yannick |
|
1
|
Thanks guys for the exhaustive responses... but perhaps the key highlight of my question should have been How can I do this easily and visualize it easily? Without coding I kind of figured a bullsh** hack that works with MEGAN: (Megan's first mission is to parse metagenomics blast results) First create a file with one line per species, and a comma-delimited number (the number is irrelevant):
Then open MEGAN (http://biostar.stackexchange.com/questions/111/taxonomy-of-blast-hits) File Menu -> "Import CSV" Tree Menu -> "Node labels on" If you subsequently do Tree Menu -> "Show Intermediate Lablels" twice, you get the following:
|
|
|
1
This would also work in iTOL (http://itol.embl.de), there you can also paste a list of names and get a tree. I think you have to replace " " by "_", though. | ||
6
|
Well, I'm not sure how to do this without at least a little bit of programming. But this is a simple problem. The python code below should get you started:
Essentially you just need to split on the Hope that helps, Will |
|
|
| ||
5
|
The following script download the two XML files for both taxons. It extracts the lineage using a XSLT stylesheet. Each lineage is then compared side by side using paste and we count the number of times the taxons were different.
the associated stylesheet is:
TEST:
|
|
3
|
Ok, if you want to visualize the information, let's still use XSLT with Graphiz Dot; The following stylesheet reads a NCBI-XML file with two taxons and generates an input for dot. It counts the maximum number of nodes in both lineages and calls recursively the template 'recursive' to print each lineage: Usage:
Result: Update there was a bug in the stylesheet below, I fixed it, but in the following image the last nodes were not printed. The Stylesheet:
|
|
|
1
very cool :) http://plindenbaum.blogspot.com/2010/06/xsltncbi-taxonomygraphviz-dot.html but might a hack just be to download the xml, grep out the lineage info, then replace the ';' by '" -> "' for graphviz? | ||
1
|
Here's some code I wrote that does just that:
A comparison between |
|
|
| ||
just a word of caution: the NCBI taxonomy isn't very reliable when it comes to the higher level groupings, e.g. they didn't adopt the new animal taxonomy (grouping e.g. nematodes and insects together).
then where would you go for this kind of information, Michael? thanks, yannick
I ended up combining the NCBI taxonomy manually with the taxonomies from recent papers (Dunn et al. Nature 2008, Rogozin et al. Genome Biology and Evolution 2009)
Another word of caution (+1 for warning against the NCBI taxonomy) is regarding your definition of close & distant. Taxonomy is very biased in splitting taxa that are near us into many levels, while little creepy crawlers are lumped into much more inclusive groups.