Question

Can I Choose An Outgroup Based On Phylogenetic Tree Structure Alone?

10

Entering edit mode

12.6 years ago

John ▴ 790

A reviewer has complained that my choice of outgroup in my phylogenetic tree may be causing long branch attraction. He/she has suggested rooting my tree with another species. I have chosen one which is basal to the rest of the species based on my first tree. Is this a good enough reason to use it as an outgroup in my new tree? I originaly chose the first outgroup because the fossil record shows that this species evolved before the others. But for my new choice of outgroup, I have nothing but a tree to base this on. Is that a valid reason to choose it as the new outgroup?

< image not found >

phylogenetics phylogeny tree • 20k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 12.6 years ago by John ▴ 790

Ram · Answer 1 · 2011-10-10

John, I think that your new outgroup choice will be fine, possibly even better. The general approach is to choose something close to the ingroup, but with no possibility of being actually within the (in)group you are studying. It would be better if you had support values to show that the new outgroup is confidently excluded from the ingroup.

More generally it is good to treat this as an experiment. Your hypothesis could be "outgroup choice does not significantly influence ingroup phylogenetic relationships". Try including a range of outgroups and also midpoint rooting. On the tree you show a number of additional credible looking outgroups at the top of the tree. If your ingroup phylogenetic hypothesis is correct and strongly supported changing OG shouldn't make much difference.

You can minimise long branch attraction perhaps by having the outgroup as a group rather than a single OTU, as this will break up the long branch. You must of course know that group to be a real grouping, else the whole tree could be skewed by artificially forcing it to exist.

Best of luck

EDIT, I didn't really answer your question. Yes it is fine to pick OG based on a tree rather than fossil or other evidence.

Ram · Answer 2 · 2011-10-10

When you build a Eukaryotic gene tree with a known species tree, speciation duplication inference (SDI) frequently (not always) works better than mid point. Mid point assumes a molecular clock. It may fail if this assumption is greatly violated. SDI assumes infrequent gene losses towards the true root of the tree, which does not always stand, either, but is better in practice.

Ram · Answer 3 · 2011-10-11

I just wanted to comment that Long Branch Attraction (LBA) is not caused by outgroup selection. If one or more species are causing this effect, you will not be able to remove it just by re-rooting the tree to a different outgroup. LBA consists of fast evolving species grouping together regardless of their evolutionary relationship. In order to detect this effect, fast evolving species should be removed from the analysis (when possible) and compared with your current result. Also, increasing/modifying taxon sampling can help to detect and reduce LBA cases. Finally, there are specific models and corrections designed to mitigate LBA (http://www.biomedcentral.com/1471-2148/7/S1/S4).

This review explains the problem in detail, and provides strategies to detect and solve LBA. For instance:

(...) It is argued that since outgroup taxa almost always represent long branches and are as such a hazard towards misplacing long branched ingroup taxa, phylogenetic analyses should always be run with and without the outgroups included. This will detect whether only the outgroup roots the ingroup or if it simultaneously alters the ingroup topology, in which case previous studies have shown that the latter is most often the worse. (...)

Ram · Answer 4 · 2011-10-10

You can try mid-point rooting:

In the absence of a good outgroup the root may be positioned by assuming approximately equal evolutionary rates over all the branches. In this way the root is put at the midpoint of the longest pathway between two OTUs. This way of rooting is called mid-point rooting.

It's implemented in ETE package if you are familiar with python.

Ram · Answer 5 · 2011-10-11

To expand on jhc's answer (and my comment to it), to reduce the impact of LBA you can try:

Removing the more distantly-related (or fast-evolving) sequences from the alignment - e.g. your original outgroup plus those other early-diverging guys at the "top" of your figure and re-aligning.
Infer the tree using a better model (depending on the one you used the first time around). Models like CAT, empirical CATs (e.g. CAT20, CAT50), UL3 etc. implemented in Phylobayes (www.phylobayes.org) might be good. Phylobayes can also do posterior predictive simulations that are supposed to (roughly speaking) give you some idea of whether the model might be doing an adequate job of avoiding LBA.

Ram · Answer 6 · 2014-09-19

Hi, I'd like to resurrect this topic and add a couple of related questions. I'm making a ML phylogeny for a small number of 18S sequences (3) belonging to an obscure group of subterranean amphipods, and am finding it difficult to find a suitable out-group. Using two out-groups obtained from the relevant literature gives very long (divergent) lengths between the out and in-groups, and so an unsatisfactory topology. The deep ML nodes are unsupported, could it be my-in-group are wrongly identified? and, could deeper unsupported nodes be an artefact of low sample size in this case? (note: there are very few GenBank sequences available for this group)