Hi all,
I have run MC-UPGMA (Memory Constrained UPGMA) in order to obtain a cluster tree. The output tree is given in 4 fields per line format: 'cluster_id1 cluster_id2 distance cluster_id3'. Each line defines one merge of two clusters in the binary clustering tree. cluster_id1 and cluster_id2 identify the pair of merged clusters, while cluster_id3 is an identifier for a new cluster - their union. cluster_id3 will appear later on as one of the two first fields when it is merged as well. This format resembles the output format of the matlab command 'linkage', with an additional 4th field.
Here there is an example of the 10 first output lines:
98 99 0 103
89 101 0 104
81 102 0 105
69 95 0 106
54 55 0 107
53 107 0 108
43 93 0 109
40 59 0 110
30 38 0 111
29 70 0 112
I would like to find a way to modify this file in order to obtain a Newick tree format.
Thanks!
Is your output sorted by branch depth bottom up, starting with the leaves already? Is BioPerl ok? Then you can do this using the Bio::Tree module and Bio::TreeIO to write the tree in Newick format. If it is not sorted that way, the tree building is a little more complicated but possible.
Finally, I decided to use another tool. Thank you for your help!