Question: Can I Choose An Outgroup Based On Phylogenetic Tree Structure Alone?
gravatar for John
8.3 years ago by
John770 wrote:

A reviewer has complained that my choice of outgroup in my phylogenetic tree may be causing long branch attraction. He/she has suggested rooting my tree with another species. I have chosen one which is basal to the rest of the species based on my first tree. Is this a good enough reason to use it as an outgroup in my new tree? I originaly chose the first outgroup because the fossil record shows that this species evolved before the others. But for my new choice of outgroup, I have nothing but a tree to base this on. Is that a valid reason to choose it as the new outgroup?

alt text

phylogeny phylogenetics tree • 16k views
ADD COMMENTlink modified 5.3 years ago by john.little00010 • written 8.3 years ago by John770
gravatar for Dave Lunt
8.3 years ago by
Dave Lunt2.0k
Hull, UK
Dave Lunt2.0k wrote:

John, I think that your new outgroup choice will be fine, possibly even better. The general approach is to choose something close to the ingroup, but with no possibility of being actually within the (in)group you are studying. It would be better if you had support values to show that the new outgroup is confidently excluded from the ingroup.

More generally it is good to treat this as an experiment. Your hypothesis could be "outgroup choice does not significantly influence ingroup phylogenetic relationships". Try including a range of outgroups and also midpoint rooting. On the tree you show a number of additional credible looking outgroups at the top of the tree. If your ingroup phylogenetic hypothesis is correct and strongly supported changing OG shouldn't make much difference.

You can minimise long branch attraction perhaps by having the outgroup as a group rather than a single OTU, as this will break up the long branch. You must of course know that group to be a real grouping, else the whole tree could be skewed by artificially forcing it to exist.

Best of luck

EDIT, I didn't really answer your question. Yes it is fine to pick OG based on a tree rather than fossil or other evidence.

ADD COMMENTlink written 8.3 years ago by Dave Lunt2.0k
gravatar for lh3
8.3 years ago by
United States
lh331k wrote:

When you build a Eukaryotic gene tree with a known species tree, speciation duplication inference (SDI) frequently (not always) works better than mid point. Mid point assumes a molecular clock. It may fail if this assumption is greatly violated. SDI assumes infrequent gene losses towards the true root of the tree, which does not always stand, either, but is better in practice.

ADD COMMENTlink written 8.3 years ago by lh331k
gravatar for jhc
8.3 years ago by
jhc2.8k wrote:

I just wanted to comment that Long Branch Attraction (LBA) is not caused by outgroup selection. If one or more species are causing this effect, you will not be able to remove it just by re-rooting the tree to a different outgroup. LBA consists of fast evolving species grouping together regardless of their evolutionary relationship. In order to detect this effect, fast evolving species should be removed from the analysis (when possible) and compared with your current result. Also, increasing/modifying taxon sampling can help to detect and reduce LBA cases. Finally, there are specific models and corrections designed to mitigate LBA (

This review explains the problem in detail, and provides strategies to detect and solve LBA. For instance:

(...) It is argued that since outgroup taxa almost always represent long branches and are as such a hazard towards misplacing long branched ingroup taxa, phylogenetic analyses should always be run with and without the outgroups included. This will detect whether only the outgroup roots the ingroup or if it simultaneously alters the ingroup topology, in which case previous studies have shown that the latter is most often the worse. (...)

ADD COMMENTlink modified 4 months ago by RamRS25k • written 8.3 years ago by jhc2.8k

I agree with this. Two aspects to this question. 1: Yes, it is OK to choose another outgroup based on the tree structure (since the tree has been "polarized" already by the first one). To reduce the possibility of LBA, you need to break up the long branches with more taxon sampling or use a different phylogenetic model. You might be able to align more positions reliably if you remove the more distantly-related species, too.

ADD REPLYlink written 8.3 years ago by Tancata200
gravatar for Leszek
8.3 years ago by
IIMCB, Poland
Leszek4.0k wrote:

You can try mid-point rooting:

"In the absence of a good outgroup the root may be positioned by assuming approximately equal evolutionary rates over all the branches. In this way the root is put at the midpoint of the longest pathway between two OTUs. This way of rooting is called mid-point rooting."

It's implemented in ETE package if you are familiar with python.

ADD COMMENTlink written 8.3 years ago by Leszek4.0k
gravatar for Tancata
8.3 years ago by
Newcastle, UK
Tancata200 wrote:

To expand on jhc's answer (and my comment to it), to reduce the impact of LBA you can try:

  1. Removing the more distantly-related (or fast-evolving) sequences from the alignment - e.g. your original outgroup plus those other early-diverging guys at the "top" of your figure and re-aligning.

  2. Infer the tree using a better model (depending on the one you used the first time around). Models like CAT, empirical CATs (e.g. CAT20, CAT50), UL3 etc. implemented in Phylobayes might be good. Phylobayes can also do posterior predictive simulations that are supposed to (roughly speaking) give you some idea of whether the model might be doing an adequate job of avoiding LBA.

ADD COMMENTlink written 8.3 years ago by Tancata200
gravatar for john.little0001
5.3 years ago by
john.little00010 wrote:

Hi, I'd like to resurrect this topic and add a couple of related questions.  I'm making a ML phylogeny for a small number of 18S sequences (3) belonging to an obscure group of subterranean amphipods, and am finding it difficult to find a suitable out-group.   Using two out-groups obtained from the relevant literature gives very long (divergent) lengths between the out and in-groups, and so an unsatisfactory topology.  The deep ML nodes are unsupported, could it be my-in-group are wrongly identified? and, could deeper unsupported nodes be an artefact of low sample size in this case? (note: there are very few GenBank sequences available for this group)

ADD COMMENTlink written 5.3 years ago by john.little00010
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1909 users visited in the last hour