Question in reading phylogenetic trees
Entering edit mode
5.5 years ago
virus_n00b • 0

I am performing phylogenetic analysis on a set of data. I have performed MSA to get the sequence alignment and constructed a phylogenetic tree. (

I have inferred the following relationships:

  1. v2+v8 , v6+v5 , v10+v9 , v7+v4 are sister groups.
  2. v1 is ancestor for v2,v8,v6 and v5.
  3. v11 is ancestor for v9, v10, v1, v2, v8, v6 and v5.
  4. v3 is ancestor for v4, v7, v11, v9, v10, v1, v2, v8, v6 and v5.

Am I correct if else what are correct relations that can be inferred.

sequencing alignment • 1.5k views
Entering edit mode

Read the book on molecular evolution by Z Yang.

Entering edit mode
5.5 years ago
pevsner ▴ 420

Hi virus_n00b: great questions but no, your statements are not correct. Here are some suggestions on how to think about phylogenetic trees.

  1. There are nodes that are either terminal (your data sequences v1 to v11) or internal (inferred ancestral sequences; some software packages allow you to see these inferred sequences, while in your tree they are indicated as solid orange circles). The tree may or may not have a node at the top (the "root") which is the inferred ancestral sequence. In your case it appears that the tree is rooted, with the root placed as an orange circle along the left-most margin halfway down the page.
  2. Every phylogenetic tree is defined by just two main properties: the branch lengths and the topology. I suggest you consult an article or textbook that explains this. For a great, brief article that explains some of the concepts in my response see Baum et al. (Science, 2005), "Evolution. The tree-thinking challenge" (PMID 16284166). In terms of your tree, I recommend that you use a software package (such as MEGA) that defines the units along the x-axis (e.g. number of amino acid or nucleotide changes per site, or time).
  3. A clade is a group that contains any subset of your sequences (e.g. v2+v8) and their common ancestor, without excluding any descendants from that common ancestor. So the clade containing v2+v8 (what you called a sister group) must also include the common ancestor of those two sequences, which is not labeled but is given by the orange circle. A clade containing v2+v8+v6 must also include v5 and the three orange circles at the upper right of your figure.
  4. If you add variant 1, your clade now has v8+v2+v6+v5+v1, and the common ancestor of this group is the orange circle from which all these five variants descend, and also including the three ancestral sequences given in orange circles at the upper right. Thus your statement that variant 1 is ancestral to v2,v8,v6, and v5 is definitely not correct.
  5. You wrote: "v11 is ancestor for v9, v10, v1, v2, v8, v6 and v5". No, again, v11 cannot be the ancestor: the orange circle that connects to v11 is the ancestor of both v11 and v9, v10, v1, v2, v8, v6, and v5.
  6. You wrote: "v3 is ancestor for v4, v7, v11, v9, v10, v1, v2, v8, v6 and v5". Here the situation is more complicated. Most phylogenetic trees are bifurcating: each node has two descendant branches. Here your tree is multifurcating which means it is not resolved. This is not necessarily a problem but means you need to be even more cautious in defining ancestors.
  7. Always be sure to use a high quality multiple sequence alignment as input. If you select random sequences you'll still be able to make an MSA and a tree, but it won't be biologically meaningful.

Good luck!

Entering edit mode

Yeah got it. But one doubt that still is not clear is "How will you say species have evolved from say v1 to v10 using phylogenetic trees". If phylogenetic trees donot tell about how they evolve which concept or method tells us about evolution?

Entering edit mode

Here are some concepts about evolution you can learn from your phylogenetic tree.

  • Given that your tree is constructed from sequences that are from different species, you can use your tree to ask which species last shared a common ancestor. In the case of mammals there is tremendous interest in knowing which species humans are most closely related to.
  • You can use your tree to assign times of evolution. Using the molecular clock hypothesis I mentioned, and if possible calibrating your tree with data from the fossil record, you can estimate when particular species emerged. (For example we shared a common ancestor with chimpanzees 5-7 million years ago [MYA], with rodents 80 MYA, with dogs 100 MYA, with chickens 310 MYA.)
  • You can borrow information from viruses and their hosts to infer the evolutionary history of HIV (learning about its emergence as a virus that infected other primates over 100 years ago) or if this is a tree of cytomegaloviruses you can infer the history of those viruses going back over 400 MYA!
  • You can inspect a tree to see if evolution is accelerated along any of its branches. (MEGA incorporates a really neat test of accelerated evolution, and provides great documentation to help make it work.) Some researchers have studied genes that evolved very rapidly on the human lineage relative to chimpanzee and other primates, determined which of those genes are expressed in brain, and published prominent papers arguing those genes may be important in our language and cognition. In trees of mammals it has been noted that some rodents undergo accelerated evolution relative to other animals. For insulin it has been seen that the rate of evolution is quite similar across many species, with guinea pig and coypu (yes, that's coypu) selectively having a seven-folder higher rate of evolution. Why? Because relative to insulin from human, mouse, rat, dog, and whale the insulin from guinea pig and coypu happened to evolve with a distinct means of metal-binding capacity, altering the structural requirements of the insulin protein and leading to rapid changes in the specified amino acids.
  • You can study the neutral rate of evolution using your tree (the neutral rate hypothesis was introduced by Motoo Kimura in a 1968 Nature paper and a 1983 book), perhaps using pseudogene sequences or fossil repetitive element sequences in your tree.
  • You can look for evidence for positive or negative selection in your tree. Positively selected sequences undergo mutation and selection at a rate higher than the neutral rate, and here it's possible (for example) that the DNA encoding just the active site of an enzyme undergoes positive (or negative) selection. Therefore the entire branch might not display accelerated evolution, but a subset of each sequence you used (perhaps the region of DNA that encodes a favorite domain) can be studied for evolutionary change.
  • If your species are viruses (or any other organism) you can look for the speciation events: perhaps there are quasi-species. For example, mosquitoes appear to be in the process of speciation right now. (For reference they last shared a common ancestor with fruit flies 250 MYA, and it's kind of exciting that we're right within the million year window now when they're about to transform into two new daughter species.)
  • A whole new sub-discipline of bioinformatics has emerged in which ancestral karyotypes are reconstructed, and your tree could help to do this. In a way our understanding of the distant past has been very fuzzy and is coming into sharper focus through molecular phylogeny.
  • Many species, from fungi to plants to fish to protozoans, have undergone whole genome duplication. That's a great way for an organism to double its repertoire of genes. (See Susumo Ohno's 1970 book Evolution by Gene Duplication.) Your tree can provide evidence for genome duplication. But don't look for it in humans: since we use an XX, XY sex chromosome system, doubling our genome would lead to sterility so that option is now closed to us.

Login before adding your answer.

Traffic: 1530 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6