- Codeml starts numbering each sequence in the multiple alignment in an increasing order. So if you have the following alignment file (.aa):
mouse
SDASDASDASD
human
WEFNWEPFWNF
chimp
ASDADAAFAF
horse
WRNWEPRWWR
chicken
QEQRTWEGGF
Then, codeml would assign node number 1 to mouse, number 2 to human, number 3 to chimp, number 4 to horse, and finally number 5 to chicken.
Codeml starts numbering internal nodes where it left numbering in the previous step and first numbers the MOST ancestral state at the root of the tree.
So the most ancestral node will get the number 6!
Then it continues numbering the internal (or ancestral) nodes in an increasing order while maintaining the ancestry relationships between the nodes (or species).
But how does Codeml know which node is more or less ancestral than another node? Well, it keeps track of this information by reporting the tree as such the most ancestral node (and the species it includes) appears on the most left (so in the beginning of the line that includes the tree[see note below]) and proceeds to the right in a decreasing order of ancestral relationship.
So if we assume that out of the five species that we have here, chicken is the most ancestral species; mouse and horse are related and follow the chicken; lastly, human and chicken are related and least ancestral or most recent. Then the tree codeml reports would look like something like this:
(5_chicken, ((1_mouse, 4_horse) 8 ,(2_human, 3_chimp) 9 ) 7) 6;
- One thing you should immediately notice about this representation must be that the most ancestral node, which is 6 in this example, appears on the most right unlike the appearance of species by ancestry, which was most left as mentioned above. So you should be careful and take into consideration this when interpreting the tree file.
Note: When you run codeml and ask for the ancestral reconstruction (RateAncestor = 1 in the model), it will produce five output files, one of which is called 'rst'. This file keeps the ancestral reconstruction.
At line number 15 (yes, always 15), you will find the tree view representation of your tree with paml labels included. To view this tree file, I suggest you use TreeView which you can get free at http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
Get that one line tree at line number 15, copy it to a new text file, and name it mytree.trees, save it.
Now go to TreeView, open this file. Then go to 'Trees' tab and choose 'Show internal node labels'. That's it.
Thanks Carlos, but I don't understand your first instruction. What is the rst file (reStructuredText markup language? I guess not?). And which program should I open that file in order to save it as a tree with node labels?
rst file is one of the output files of Codeml (find it in the same folder where you run Codeml after the completion of the analyses). You can import it in excel in a tab-delimited format. if you are interested in the tables or just open it in a text editor to do what Carlos said.
Cheers, Kartik
Hi Carlos. I was wondering if the labels are printed out in the rst file for every model in Codeml or only when we change the molecular clock? Because it never prints out the node-label tree in the rst file or in the main output file for me. I don't know if it has something to do with the models being used.