Question: Newick 2 Json Converter (Preferably In Perl)
gravatar for Fabsta
7.0 years ago by
Fabsta120 wrote:

Hi! Does anyone know a Perl snippet that converts a tree in newick format into a JSON string?

I had a look at Bio::Phylo, but could not really find a solution.

Any help is much appreciated.

An example newick string could be (sorry for the length):


Thanks a lot in advance, Fabian

perl • 4.1k views
ADD COMMENTlink modified 7.0 years ago by Damian Kao15k • written 7.0 years ago by Fabsta120

JSON is just a general data structure specification whereas Newick is specifically used for trees. I don't think there are any standardize rules for representing Newick in JSON. You'll have to tailor something according to how you want to use the JSON data. Why do you want to convert it to JSON?

ADD REPLYlink written 7.0 years ago by Damian Kao15k

sounds fun. Do you have any sample file please ?

ADD REPLYlink written 7.0 years ago by Pierre Lindenbaum121k

I smell a round of code golf happening...

ADD REPLYlink written 7.0 years ago by Damian Kao15k

Yes, I updated the post. Looking forward to a solution :-)

ADD REPLYlink written 7.0 years ago by Fabsta120

How do you expect the JSON to look?

ADD REPLYlink written 7.0 years ago by asjo120
gravatar for asjo
7.0 years ago by
asjo120 wrote:

Here is an attempt, with the caveat that your example is really complicated, and that you haven't really defined what the JSON result should look like:


use strict;
use warnings;

use Bio::Phylo::IO;
use JSON;

my $forest=Bio::Phylo::IO->parse(-file=>"example.newick", -format=>"newick");

while (my $tree=$forest->next) {
    my $out=[];
    my $children=$out;
    my $cur;
    my $parent;
                               -pre=>sub { $cur={ name=>shift->get_name }; push @$children, $cur; },
                               -pre_daughter=>sub { $cur->{children}=[]; $parent=$cur; $children=$cur->{children} },
    print JSON->new->pretty->encode($out);

If I run it on a smaller example (from wikipedia's entry on Newick format):


I get this:

      "name" : "F",
      "children" : [
            "name" : "A"
            "name" : "B"
            "name" : "E",
            "children" : [
                  "name" : "C"
                  "name" : "D"

But again, I don't know how you want the JSON output formatted.

ADD COMMENTlink written 7.0 years ago by asjo120

Thanks a lot, asjo, for the quick and elegant answer. The output format is exactly what I need.

ADD REPLYlink written 7.0 years ago by Fabsta120
gravatar for Damian Kao
7.0 years ago by
Damian Kao15k
Damian Kao15k wrote:

What's great about using python to output JSON is that stringifying native python arrays/dictionary conforms to JSON specs. So you can really just print str(myStructure) and it will output JSON accordingly.

Like I said previously, I am not sure how you want the JSON to look as there are no standardized rules for writing Newick in JSON. I just made it output a simple key:value structure. For example the output of your sample would be something like: (I took out a bunch of data in the middle so I don't go over the post character limit)

{'Opisthokonta': ['Capsasporaowczarzaki', {'Codonosigidae': ['Proterospongia', 'Monosigabrevicollis']}, {'Metazoa': .....[bunch of stuff]}, {'Fungi': ['  Spizellomycespunctatus', 'Allomycesmacrogynus', 'Saccharomycescerevisiae', 'Phycomyces_blakesleeanus']}]}

Here is something in python without using BioPython (yes I was bored):

edit** This is just for fun. Use the BioPython/BioPerl solutions from the other answers if you want accurate results.

Not sure if it's 100% working with everything; however, It does work with your sample. It requires a root node.

def parseNode(nwString):
    parenCount = 0

    key = ''
    processed = ''
    index = 0
    for char in nwString:
        if char == "(":
            parenCount += 1
            if parenCount == 1:
        elif char == ")":
            parenCount -= 1
            if parenCount == 0:
                if index + 2 > len(nwString):
                    key = nwString[index + 2:]

        if char == ",":
            if parenCount != 1:
                processed += "|"
                processed += ","
            processed += char

        index += 1

    data = processed.split(',')

    for i in range(len(data)):
        data[i] = data[i].replace('|',',')

    return (key.strip(),data)

def recurseBuild(nwString):
    if nwString.find('(') == -1:
        if len(nwString.split(',')) == 1:
            return nwString
            return nwString.split(',')
        key, data = parseNode(nwString)

        dataArray = []
        for item in data:

        return {key:dataArray}

result = recurseBuild(myNewickstring)

print result
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Damian Kao15k

Thank you very much for sharing.

I adapted this functions to fit the newick-format regarding the optional branch-lengths. Now I've got three keys for each node/leaf: a "label", a "distance" and a "tree" ("tree" contains the nested clades). Code available here:

ADD REPLYlink written 6.1 years ago by zhch3330
gravatar for Pierre Lindenbaum
7.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

Here is a C lex/yacc solution:

The bison parser:

The *flex lexer:

The Makefile:

bison -d newick.y
flex newick.l
gcc -Wall -O3 lex.yy.c


a.out < input.newick.txt | fold -w 60


Speed ?

$ time (x=1; while [ $x -le 1000 ]; do  ./a.out < input.newick.txt > /dev/null ; x=$(( $x + 1)); done )

real    0m2.876s
user    0m0.192s
sys    0m0.516s
ADD COMMENTlink written 7.0 years ago by Pierre Lindenbaum121k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1028 users visited in the last hour