Question

About Outgroups In Phylogenetic Analysis

0

Entering edit mode

11.3 years ago

Yongjie Zhang ▴ 110

Hi All,

I want to do phylogenetic analyses based on single-locus data sets as well as based on the combined multiple-gene dataset.

For the single-locus data set, I had found out two most suitable outgroups by BlastN against public database, however, the sequences of the two outgroup taxa are shorter than my ingroup sequences at both ends (5' end and 3' end) although there is perfect alignment between outgroups and ingroups. I don't want to trim my ingroups from both ends of the alignment because valuable informative characters are included in those regions (i.e., the regions where my ingroups have but outgroups don't). I want to know in my case, can I still use the two outgroups with some alignment gaps (or more strict, missing ) being kept at both ends of the outgroup sequences?

My another concern is when I use one outgroup taxon, no support value is shown for the ingroup clade, but when two outgroup taxa is used, there is 100% support for the ingroup clade. I want to know why is so, and do I have to use at least two outgroup taxa.

For the combined dataset, my question is also about the outgroups. Because different outgroup taxa were used for each single-locus data sets, how should I determine the outgroups for multiple-loci data set. Can I concatenate together those outgroup sequences from each single-locus data set? By doing so, I may make the artificial taxon/sequences.

Hope to have your help!

Thanks.

Yongjie

• 6.2k views

ADD COMMENT • link updated 11.3 years ago by DG 7.3k • written 11.3 years ago by Yongjie Zhang ▴ 110

score 1 · Answer 1 · 2014-03-18

What type of phylogenetic analysis are you doing? That sometimes impacts a bit on outgroup choice. But in general keep in mind that for maximum-likelihood phylogenetics you are usually estimating an unrooted phylogenetic tree, which you can then view as rooted, using the outgroup of your choice. If you did this and chose to view rooted with only one of the two "outgroup" taxa, it wouldn't be surprising that you see poor support for the clade of interest if the other outgroup taxa is being included in what you are looking at.

Also it is fine not to trim. You don't have to, and indeed shouldn't, trim all sites that contain gaps. You should only trim/mask sites that are so full of gaps that they cause concern about the quality of the alignment itself or when they become totally uninformative. You want to maximize the number of informative sites retained, as long as the phylogenetic software you are using (and underlying model) handle gapped alignments. Which today, there is no excuse not to be using good software.

It is normal to concatenate multiple-genes together for phylogenetic analyses. Depending on your dataset there may be issues with congruence of genes though. Keep in mind that a gene tree can be different for the actual species tree for valid biological reasons. But in general concatenating a few genes together to improve your phylogenetic reconstruction is normal and acceptable practice.