Question: Newick 2 Json Converter (Preferably In Perl)
1
gravatar for Fabsta
5.1 years ago by
Fabsta120
Fabsta120 wrote:

Hi! Does anyone know a Perl snippet that converts a tree in newick format into a JSON string?

I had a look at Bio::Phylo, but could not really find a solution.

Any help is much appreciated.

An example newick string could be (sorry for the length):

(Capsaspora_owczarzaki,(Proterospongia,Monosiga_brevicollis)Codonosigidae,(Amphimedon_queenslandica,Trichoplax_adhaerens,(((((((((((((Tupaia_belangeri,((Cavia_porcellus,(Ictidomys_tridecemlineatus,(Rattus_norvegicus,Mus_musculus)Murinae,Dipodomys_ordii)Sciurognathi)Rodentia,(Oryctolagus_cuniculus,Ochotona_princeps)Lagomorpha)Glires,((Otolemur_garnettii,Microcebus_murinus)Strepsirrhini,((((Nomascus_leucogenys,(Pongo_abelii,(Homo_sapiens,Pan_troglodytes,Gorilla_gorilla)Homininae)Hominidae)Hominoidea,Macaca_mulatta)Catarrhini,Callithrix_jacchus)Simiiformes,Tarsius_syrichta)Haplorrhini)Primates)Euarchontoglires,(Procavia_capensis,Loxodonta_africana,Echinops_telfairi)Afrotheria,((Pteropus_vampyrus,Myotis_lucifugus)Chiroptera,Equus_caballus,(Vicugna_pacos,Bos_taurus,Sus_scrofa,Tursiops_truncatus)Cetartiodactyla,(Felis_catus,(Ailuropoda_melanoleuca,Canis_lupus_familiaris)Caniformia)Carnivora,(Sorex_araneus,Erinaceus_europaeus)Insectivora)Laurasiatheria,(Dasypus_novemcinctus,Choloepus_hoffmanni)Xenarthra)Eutheria,(Monodelphis_domestica,Macropus_eugenii,Sarcophilus_harrisii)Metatheria)Theria,Ornithorhynchus_anatinus)Mammalia,(Anolis_carolinensis,(Taeniopygia_guttata,(Meleagris_gallopavo,Gallus_gallus)Phasianidae)Neognathae)Sauria)Amniota,Xenopus_tropicalis)Tetrapoda,((((Tetraodon_nigroviridis,Takifugu_rubripes)Tetraodontidae,(Gasterosteus_aculeatus,Oryzias_latipes)Smegmamorpha)Percomorpha,Gadus_morhua)Holacanthopterygii,Danio_rerio)Clupeocephala)Euteleostomi,Petromyzon_marinus)Vertebrata,Branchiostoma_floridae,(Ciona_savignyi,Ciona_intestinalis)Ciona)Chordata,Strongylocentrotus_purpuratus)Deuterostomia,(Lottia_gigantea,(Ixodes_scapularis,((((Atta_cephalotes,Apis_mellifera)Aculeata,(((Drosophila_virilis,Drosophila_mojavensis)Drosophila,Drosophila_grimshawi,(Drosophila_willistoni,(Drosophila_pseudoobscura,Drosophila_persimilis)pseudoobscura_subgroup,((Drosophila_yakuba,Drosophila_simulans,Drosophila_sechellia,Drosophila_melanogaster,Drosophila_erecta)melanogaster_subgroup,Drosophila_ananassae)melanogaster_group)Sophophora)Drosophila,(Anopheles_gambiae,(Culex_quinquefasciatus,Aedes_aegypti)Culicinae)Culicidae)Diptera,Bombyx_mori,Tribolium_castaneum)Endopterygota,(Pediculus_humanus,Acyrthosiphon_pisum)Paraneoptera)Neoptera,(Parhyale_hawaiensis,Daphnia_pulex)Crustacea)Pancrustacea)Arthropoda,(Capitella_teleta,Helobdella_robusta)Annelida)Protostomia)Coelomata,(Pristionchus_pacificus,(Caenorhabditis_japonica,Caenorhabditis_brenneri,Caenorhabditis_remanei,Caenorhabditis_elegans,Caenorhabditis_briggsae)Caenorhabditis)Chromadorea,Schistosoma_mansoni)Bilateria,(Nematostella_vectensis,Hydra_magnipapillata)Cnidaria)Eumetazoa)Metazoa,(Spizellomyces_punctatus,Allomyces_macrogynus,Saccharomyces_cerevisiae,Phycomyces_blakesleeanus)Fungi)Opisthokonta;

Thanks a lot in advance, Fabian

perl • 3.1k views
ADD COMMENTlink modified 5.1 years ago by Damian Kao14k • written 5.1 years ago by Fabsta120

JSON is just a general data structure specification whereas Newick is specifically used for trees. I don't think there are any standardize rules for representing Newick in JSON. You'll have to tailor something according to how you want to use the JSON data. Why do you want to convert it to JSON?

ADD REPLYlink written 5.1 years ago by Damian Kao14k

sounds fun. Do you have any sample file please ?

ADD REPLYlink written 5.1 years ago by Pierre Lindenbaum96k
2

I smell a round of code golf happening...

ADD REPLYlink written 5.1 years ago by Damian Kao14k

Yes, I updated the post. Looking forward to a solution :-)

ADD REPLYlink written 5.1 years ago by Fabsta120

How do you expect the JSON to look?

ADD REPLYlink written 5.1 years ago by asjo120
4
gravatar for asjo
5.1 years ago by
asjo120
asjo120 wrote:

Here is an attempt, with the caveat that your example is really complicated, and that you haven't really defined what the JSON result should look like:

#!/usr/bin/perl

use strict;
use warnings;

use Bio::Phylo::IO;
use JSON;

my $forest=Bio::Phylo::IO->parse(-file=>"example.newick", -format=>"newick");

while (my $tree=$forest->next) {
    my $out=[];
    my $children=$out;
    my $cur;
    my $parent;
    $tree->visit_breadth_first(
                               -pre=>sub { $cur={ name=>shift->get_name }; push @$children, $cur; },
                               -pre_daughter=>sub { $cur->{children}=[]; $parent=$cur; $children=$cur->{children} },
                              );
    print JSON->new->pretty->encode($out);
}

If I run it on a smaller example (from wikipedia's entry on Newick format):

(A,B,(C,D)E)F;

I get this:

[
   {
      "name" : "F",
      "children" : [
         {
            "name" : "A"
         },
         {
            "name" : "B"
         },
         {
            "name" : "E",
            "children" : [
               {
                  "name" : "C"
               },
               {
                  "name" : "D"
               }
            ]
         }
      ]
   }
]

But again, I don't know how you want the JSON output formatted.

ADD COMMENTlink written 5.1 years ago by asjo120

Thanks a lot, asjo, for the quick and elegant answer. The output format is exactly what I need.

ADD REPLYlink written 5.1 years ago by Fabsta120
2
gravatar for Damian Kao
5.1 years ago by
Damian Kao14k
UK
Damian Kao14k wrote:

What's great about using python to output JSON is that stringifying native python arrays/dictionary conforms to JSON specs. So you can really just print str(myStructure) and it will output JSON accordingly.

Like I said previously, I am not sure how you want the JSON to look as there are no standardized rules for writing Newick in JSON. I just made it output a simple key:value structure. For example the output of your sample would be something like: (I took out a bunch of data in the middle so I don't go over the post character limit)

{'Opisthokonta': ['Capsasporaowczarzaki', {'Codonosigidae': ['Proterospongia', 'Monosigabrevicollis']}, {'Metazoa': .....[bunch of stuff]}, {'Fungi': ['  Spizellomycespunctatus', 'Allomycesmacrogynus', 'Saccharomycescerevisiae', 'Phycomyces_blakesleeanus']}]}

Here is something in python without using BioPython (yes I was bored):

edit** This is just for fun. Use the BioPython/BioPerl solutions from the other answers if you want accurate results.

Not sure if it's 100% working with everything; however, It does work with your sample. It requires a root node.

def parseNode(nwString):
    parenCount = 0

    key = ''
    processed = ''
    index = 0
    for char in nwString:
        if char == "(":
            parenCount += 1
            if parenCount == 1:
                continue
        elif char == ")":
            parenCount -= 1
            if parenCount == 0:
                if index + 2 > len(nwString):
                    break
                else:
                    key = nwString[index + 2:]
                    break

        if char == ",":
            if parenCount != 1:
                processed += "|"
            else:
                processed += ","
        else:
            processed += char

        index += 1

    data = processed.split(',')

    for i in range(len(data)):
        data[i] = data[i].replace('|',',')

    return (key.strip(),data)

def recurseBuild(nwString):
    if nwString.find('(') == -1:
        if len(nwString.split(',')) == 1:
            return nwString
        else:
            return nwString.split(',')
    else:
        key, data = parseNode(nwString)

        dataArray = []
        for item in data:
            dataArray.append(recurseBuild(item))

        return {key:dataArray}

result = recurseBuild(myNewickstring)

print result
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Damian Kao14k

Thank you very much for sharing.

I adapted this functions to fit the newick-format regarding the optional branch-lengths. Now I've got three keys for each node/leaf: a "label", a "distance" and a "tree" ("tree" contains the nested clades). Code available here: http://pastebin.com/Pk717Uc2

ADD REPLYlink written 4.2 years ago by zhch3330
1
gravatar for Pierre Lindenbaum
5.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum96k wrote:

Here is a C lex/yacc solution: https://gist.github.com/3056165

The bison parser: https://raw.github.com/gist/3056165/54220d50bcad1a72462bdb00dc258d6d472c7cfd/newick.y

The *flex lexer: https://raw.github.com/gist/3056165/407f2924a533793b33c9c5b0d144ca7e1dd24e31/newick.l

The Makefile:

all:
bison -d newick.y
flex newick.l
gcc -Wall -O3 newick.tab.c lex.yy.c

Test:

a.out < input.newick.txt | fold -w 60

{"label":"Opisthokonta","children":[{"label":"Capsasporaowcz
arzaki"},{"label":"Codonosigidae","children":[{"label":"Prot
erospongia"},{"label":"Monosigabrevicollis"}]},{"label":"Met
azoa","children":[{"label":"Amphimedonqueenslandica"},{"labe
l":"Trichoplaxadhaerens"},{"label":"Eumetazoa","children":[{
"label":"Bilateria","children":[{"label":"Coelomata","childr
en":[{"label":"Deuterostomia","children":[{"label":"Chordata
(...)
[{"label":"Caenorhabditisjaponica"},{"label":"Caenorhabditis
brenneri"},{"label":"Caenorhabditisremanei"},{"label":"Caeno
rhabditiselegans"},{"label":"Caenorhabditisbriggsae"}]}]},{"
label":"Schistosomamansoni"}]},{"label":"Cnidaria","children
":[{"label":"Nematostellavectensis"},{"label":"Hydramagnipap
illata"}]}]}]},{"label":"Fungi","children":[{"label":"Spizel
lomycespunctatus"},{"label":"Allomycesmacrogynus"},{"label":
"Saccharomycescerevisiae"},{"label":"Phycomyces_blakesleeanu
s"}]}]}

Speed ?

$ time (x=1; while [ $x -le 1000 ]; do  ./a.out < input.newick.txt > /dev/null ; x=$(( $x + 1)); done )

real    0m2.876s
user    0m0.192s
sys    0m0.516s
ADD COMMENTlink written 5.1 years ago by Pierre Lindenbaum96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 692 users visited in the last hour