Ok, if you want to visualize the information, let's still use XSLT with Graphiz Dot; The following stylesheet reads a NCBI-XML file with two taxons and generates an input for dot. It counts the maximum number of nodes in both lineages and calls recursively the template 'recursive' to print each lineage:
Usage:
xsltproc --novalid taxonomy2dot.xsl \
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?id=7070,32351&db=taxonomy&retmode=xml" |\
dot -ofile.jpg -Tjpg
Result:
Update there was a bug in the stylesheet below, I fixed it, but in the following image the last nodes were not printed.
see
The Stylesheet:
<xsl:stylesheet xmlns:xsl="<a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
version='1.0'
>
<xsl:output method="text"/>
<xsl:variable name="lineage1" select="/TaxaSet/Taxon[1]/LineageEx/Taxon"/>
<xsl:variable name="count1" select="count($lineage1)"/>
<xsl:variable name="lineage2" select="/TaxaSet/Taxon[2]/LineageEx/Taxon"/>
<xsl:variable name="count2" select="count($lineage2)"/>
<xsl:template match="/">
digraph G
{
<xsl:call-template name="recursive">
<xsl:with-param name="index" select="number(1)"/>
</xsl:call-template>
}
</xsl:template>
<xsl:template match="Taxon">
<xsl:value-of select="concat('Tax',TaxId)"/>[label="<xsl:value-of select="ScientificName"/>"];
</xsl:template>
<xsl:template name="recursive">
<xsl:param name="index"/>
<xsl:variable name="tax1" select="$lineage1[$index]/TaxId"/>
<xsl:variable name="tax2" select="$lineage2[$index]/TaxId"/>
<xsl:choose>
<xsl:when test="$index > $count1 and $index > $count2 "></xsl:when>
<xsl:when test="$index > $count1">
<xsl:apply-templates select="$lineage2[$index]"/>
<xsl:value-of select="concat('Tax',$lineage2[$index - 1]/TaxId,' -> Tax',$tax2)"/>;
<xsl:call-template name="recursive">
<xsl:with-param name="index" select="$index +1"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="$index > $count2">
<xsl:apply-templates select="$lineage1[$index]"/>
<xsl:value-of select="concat('Tax',$lineage1[$index - 1]/TaxId,' -> Tax',$tax1)"/>;
<xsl:call-template name="recursive">
<xsl:with-param name="index" select="$index +1"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="$tax1 != $tax2">
<xsl:apply-templates select="$lineage2[$index]"/>
<xsl:value-of select="concat('Tax',$lineage2[$index - 1]/TaxId,' -> Tax',$tax2)"/>;
<xsl:apply-templates select="$lineage1[$index]"/>
<xsl:value-of select="concat('Tax',$lineage1[$index - 1]/TaxId,' -> Tax',$tax1)"/>;
<xsl:call-template name="recursive">
<xsl:with-param name="index" select="$index +1"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="$lineage1[$index]"/>
<xsl:if test="$index > number(1)">
<xsl:value-of select="concat('Tax',$lineage1[$index - 1]/TaxId,' -> Tax',$tax1)"/>;
</xsl:if>
<xsl:call-template name="recursive">
<xsl:with-param name="index" select="$index +1"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
just a word of caution: the NCBI taxonomy isn't very reliable when it comes to the higher level groupings, e.g. they didn't adopt the new animal taxonomy (grouping e.g. nematodes and insects together).
then where would you go for this kind of information, Michael? thanks, yannick
I ended up combining the NCBI taxonomy manually with the taxonomies from recent papers (Dunn et al. Nature 2008, Rogozin et al. Genome Biology and Evolution 2009)
Another word of caution (+1 for warning against the NCBI taxonomy) is regarding your definition of close & distant. Taxonomy is very biased in splitting taxa that are near us into many levels, while little creepy crawlers are lumped into much more inclusive groups.