Question: Galaxy Taxonomy Output To Newick Tree
1
gravatar for Zach Powers
8.1 years ago by
Zach Powers340
Zach Powers340 wrote:

Hi Biostars,

I would like to take the output of Galaxy's Metagenomic Analyses > Fetch Taxonomic representation and make a Taxonomic tree that contains abundance information. For display, I could then experiment with the the many excellent visualization softwares out there ( i am particualry interested in the ETE2 library) . This question is very similar to a previous question where the author was reccommended to use iTOL for a similar purpose. I am wondering, however, if it possible to generate an abundance-information-containing Newick tree directly from the Galaxy Metagenomic Analyses output - ideally to provide a flexible replacement for the current taxonomic abundance-visualization tool. The values in the Newick tree could actually come from the

The data output is a table with a few identifier collumns and then the kindom/phlyum/class.... Here are a few lines (sorry that it wraps):

contig00428    562    root    Bacteria    n    n    n    Proteobacteria    n    n    Gammaproteobacteria    n    n    Enterobacteriales    n    n    Enterobacteriaceae    n    n    n    Escherichia    n    Escherichia coli    n    42469
contig00073    562    root    Bacteria    n    n    n    Proteobacteria    n    n    Gammaproteobacteria    n    n    Enterobacteriales    n    n    Enterobacteriaceae    n    n    n    Escherichia    n    Escherichia coli    n    42945
contig00672    562    root    Bacteria    n    n    n    Proteobacteria    n    n    Gammaproteobacteria    n    n    Enterobacteriales    n    n    Enterobacteriaceae    n    n    n    Escherichia    n    Escherichia coli    n    146126
contig00143    562    root    Bacteria    n    n    n    Proteobacteria    n    n    Gammaproteobacteria    n    n    Enterobacteriales    n    n    Enterobacteriaceae    n    n    n    Escherichia    n    Escherichia coli    n    287840
contig01215    562    root    Bacteria    n    n    n    Proteobacteria    n    n    Gammaproteobacteria    n    n    Enterobacteriales    n    n    Enterobacteriaceae    n    n    n    Escherichia    n    Escherichia coli    n    290448

Alternatively the 'Summarize Taxonomy tool' provides the following:

superkingdom    Eukaryota    9
superkingdom    Viruses    1
kingdom    Fungi    2
kingdom    Metazoa    5
kingdom    Viridiplantae    1
subkingdom    Dikarya    2
superphylum    Bacteroidetes/Chlorobi group    22
superphylum    Chlamydiae/Verrucomicrobia group    6
superphylum    Fibrobacteres/Acidobacteria group    8
phylum    Acidobacteria    8
phylum    Actinobacteria    9
phylum    Apicomplexa    1

any ideas would be appreciated, zach cp

galaxy taxonomy • 3.0k views
ADD COMMENTlink written 8.1 years ago by Zach Powers340

Zach, how big (# of terminal nodes) are your trees?

ADD REPLYlink written 8.1 years ago by jhc2.8k

jhc, the tree varies depending on the experiment but a few thousand is typical. I actually came across your Taxonomy Lookup and have been playing with it. It is similar to Pierre's but outputs the newick tree which I can then use with the ETE2 library for custom layout. (which is awesome - thanks!).

ADD REPLYlink written 8.1 years ago by Zach Powers340
0
gravatar for Pierre Lindenbaum
8.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

The following java code: https://gist.github.com/2787783 takes two parameters:

  • a directory for the dump of NCBI taxonomy
  • a file containing a list of taxonomy-id

    javac -Biostar45691.java java Biostar45691NCBITAXONOMY taxonsidlist.txt > result.gexf

it then produces a XML-based Graph (Gexf) .

<graphml xmlns="&lt;a href=" http:="" graphml.graphdrawing.org="" xmlns"="" rel="nofollow">http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org /xmlns <http: graphml.graphdrawing.org="" xmlns="" 1.0="" graphml.xsd"&gt;&gt;="" &lt;key="" id="taxname" for="node" attr.name="Taxon.Name" attr.type="string" &gt;="" &lt;key="" id="count" for="node" attr.name="Count" attr.type="int" &gt;="" &lt;key="" id="countleaf" for="node" attr.name="CountLeaf" attr.type="int" &gt;="" &lt;graph="" edgedefault="directed" &gt;="" &lt;node="" id="28216" &gt;="" &lt;data="" key="taxname" &gt;Betaproteobacteria&lt;="" data&gt;="" &lt;data="" key="count" &gt;1&lt;="" data&gt;="" &lt;data="" key="countleaf" &gt;0&lt;="" data&gt;="" &lt;="" node&gt;="" &lt;node="" id="42256" &gt;="" &lt;data="" key="taxname" &gt;Rubrobacter="" radiotolerans&lt;="" data&gt;="" &lt;data="" key="count" &gt;1&lt;="" data&gt;="" &lt;data="" key="countleaf" &gt;1&lt;="" data&gt;="" &lt;="" node&gt;="" &lt;node="" id="2748" &gt;="" &lt;data="" key="taxname" &gt;Carnobacterium="" divergens&lt;="" data&gt;="" &lt;data="" key="count" &gt;1&lt;="" data&gt;="" &lt;data="" key="countleaf" &gt;1&lt;="" data&gt;="" &lt;="" node&gt;="" &lt;node="" id="2" &gt;(...)<="" p="">

You can open this kind of graph using gephi and visualize the usage of each taxon.

In the example below, I've displayed a few ~100 nucleotide having "16S RNA" in their title. You can use the number of times a taxon and his parents (lower) were found or only the count of each taxon from your list (upper)

enter image description here

ADD COMMENTlink modified 8.1 years ago • written 8.1 years ago by Pierre Lindenbaum129k

Pierre, thank you. This is basically what I am looking for except that I am unfamiliar with the gexf format. Is it possible to control tree shape/appearance? -- (I don't see anything like a phylogenetic tree in the Gephi documentation.)

ADD REPLYlink written 8.1 years ago by Zach Powers340

gexf is a simple XML-based format describing a graph (nodes+edges) and used by gephi. Once opened in gephi, you can customize the graph (colors, layout, etc...) http://gexf.net/format/basic.html

ADD REPLYlink written 8.1 years ago by Pierre Lindenbaum129k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1317 users visited in the last hour