Question

Neighbor joining and unrooted tree

0

Entering edit mode

9.4 years ago

l.roca ▴ 10

Hi,

Is the result of Neighbor joining in phylogenetic trees is always unrooted tree?

Thanks

Neighbor-joining • 9.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by l.roca ▴ 10

2

Entering edit mode

Yes.......

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by lh3 33k

Ram · Answer 1 · 2014-12-03

1

Entering edit mode

9.4 years ago

Brice Sarver ★ 3.8k

I provided a more thorough answer to this question in the comments of another thread (How to perform phylogeny analyses). I'll reproduce it here.

Both NJ and UPGMA use distances to construct a tree. NJ trees are not necessarily rooted, whereas UPGMA (as it's a clustering approach) are. The biggest difference is that UPGMA assumes a constant rate of evolution across the lineages i.e., a molecular clock; because this is often violated in empirical datasets, this approach is usually considered sub-optimal.

Since the NJ tree will be unrooted, sequences/taxa will be polarized based on how you perform the rooting. The placement of the root is one of the most difficult parts of estimating a tree. Depending on your question, an unrooted NJ tree may be appropriate for what you need. Alternatively, I'd recommend estimating a high-quality tree using a Bayesian approach such as BEAST which places a prior on the age of the root.

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Brice Sarver ★ 3.8k

1

Entering edit mode

Are Bayesian methods considered the best nowadays? When I worked on treefam ~8 years ago, I was trying to avoid Bayesian methods. Besides efficiency, the major problem was that MrBayes significantly overestimated confidence values. This was not solved at that time. Are Bayesian confidence values good these days?

On rooting, the simplest approach is to place the root in the middle of the longest leaf-to-leaf path. This also assumes a clock, but it usually works well in my experience. For gene trees with a known species tree, minimizing duplication and gene loss events are frequently better at my hand. Of course, there is the outgroup approach if the outgroup is known for sure.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by lh3 33k

0

Entering edit mode

Bayesian methods often provide more information, including allowing the user to specifically quantify (and possibly integrate) over uncertainty via the posterior distributions of parameter estimates. BEAST, for example, can produce estimates of a lot of additional parameters (like net diversification rate and relative extinction rate based on the tree prior) while simultaneously estimating distributions of among-lineage relative/absolute rates. These approaches are computationally tractable nowadays except for large datasets that often require the use of approximate likelihood approaches.

Midpoint rooting may be appropriate, but there are several applications for which it may not be. Fully Bayesian approaches allow users to place informed priors on the age of the root (or other nodes) and also restrict tree space by assuming a monophyletic ingroup. Since rooting an unrooted tree is an explicit hypothesis regarding the evolutionary history of a group (and the characterization of the associated ancestral states), it can be tricky. When I mentioned that determining the placement of the root is difficult, I meant that in a statistical and computational sense; figuring out where the root goes up there with calculating variances on branch lengths.

A lot has happened in the past four years, let alone the past 8! Super cool time to be doing phylogenetics.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Thanks, but my question is not answered: are confidence values still overestimated? Are there papers about this topic? How are MrBayes/BEAST confidence values compared to bootstrapping values and aLRT and aBayes used by phyml?

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by lh3 33k

0

Entering edit mode

Ah, okay! I know a bit about this.

There has been some work relating error in the branch lengths to posterior probabilities. A series of studies (e.g., Brown et al., Marshall, and citations therein) also investigated the presence of longer branches in Bayesian trees; this is attributed to the choice of priors. However, early work by Mike Alfaro and others suggests posterior probabilities are about as good as the ML bootstrap and less biased.

This isn't something that I follow extremely closely, but recent studies accept the notion that posterior support values are inflated. Furthermore, when there is character conflict, both posterior and bootstrap support values are affected. Whereas some ML programs allow users to collapse short branches into hard (or soft) polytomies, many Bayesian programs do not do so. This results in bifurcations that are 'real' in that they exist in the tree but result from stochasticity during inference and should be collapsed using an aLRT cutoff (as suggested in the previous study) prior to support calculation.

In my professional experience, it seems that impact of inflated support values is mitigated by selecting a conservative cutoff and, in many circumstances, the ability to quantify uncertainty in estimates across a posterior distribution of trees (say, for comparative phylogenetic inference) outweighs this downside.

I'd love to hear any additional thoughts you may have on this topic!

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Brice Sarver ★ 3.8k