Question: About Outgroups In Phylogenetic Analysis
gravatar for Yongjie Zhang
6.5 years ago by
UC Berkeley, USA/ Shanxi Univ, China
Yongjie Zhang80 wrote:

Hi All,

I want to do phylogenetic analyses based on single-locus data sets as well as based on the combined multiple-gene dataset.

For the single-locus data set, I had found out two most suitable outgroups by BlastN against public database, however, the sequences of the two outgroup taxa are shorter than my ingroup sequences at both ends (5' end and 3' end) although there is perfect alignment between outgroups and ingroups. I don't want to trim my ingroups from both ends of the alignment because valuable informative characters are included in those regions (i.e., the regions where my ingroups have but outgroups don't). I want to know in my case, can I still use the two outgroups with some alignment gaps (or more strict, missing ) being kept at both ends of the outgroup sequences?

My another concern is when I use one outgroup taxon, no support value is shown for the ingroup clade, but when two outgroup taxa is used, there is 100% support for the ingroup clade. I want to know why is so, and do I have to use at least two outgroup taxa.

For the combined dataset, my question is also about the outgroups. Because different outgroup taxa were used for each single-locus data sets, how should I determine the outgroups for multiple-loci data set. Can I concatenate together those outgroup sequences from each single-locus data set? By doing so, I may make the artificial taxon/sequences.

Hope to have your help!



ADD COMMENTlink modified 6.5 years ago by DG7.1k • written 6.5 years ago by Yongjie Zhang80
gravatar for DG
6.5 years ago by
DG7.1k wrote:

What type of phylogenetic analysis are you doing? That sometimes impacts a bit on outgroup choice. But in general keep in mind that for maximum-likelihood phylogenetics you are usually estimating an unrooted phylogenetic tree, which you can then view as rooted, using the outgroup of your choice. If you did this and chose to view rooted with only one of the two "outgroup" taxa, it wouldn't be surprising that you see poor support for the clade of interest if the other outgroup taxa is being included in what you are looking at.

Also it is fine not to trim. You don't have to, and indeed shouldn't, trim all sites that contain gaps. You should only trim/mask sites that are so full of gaps that they cause concern about the quality of the alignment itself or when they become totally uninformative. You want to maximize the number of informative sites retained, as long as the phylogenetic software you are using (and underlying model) handle gapped alignments. Which today, there is no excuse not to be using good software.

It is normal to concatenate multiple-genes together for phylogenetic analyses. Depending on your dataset there may be issues with congruence of genes though. Keep in mind that a gene tree can be different for the actual species tree for valid biological reasons. But in general concatenating a few genes together to improve your phylogenetic reconstruction is normal and acceptable practice.

ADD COMMENTlink written 6.5 years ago by DG7.1k

Dear Dan,

Thanks for your answer.

My ingroups are over 100 individuals belonging to the same fungal species. Most of the 7 genes I used are specific to this fungal species, and so it has become a problem for me to choose suitable outgroups. Although for 3 genes I have chosen two outgroup sequences for each gene by BlastN, for other 4 genes I cannot find a suitable outgroup. For the 3 genes that each have two outgroups, the outgroup taxa are all different among the 3 genes (for the first gene, the outgroups are species A and B; the second gene, species C and D; and the third gene, species E and F). I'm still not clear how to determine the outgroups that will be used in multi-gene phylogeny. Can I concatenate the outgroup sequences from the 3 genes anyway (I'll be actually making nonexistent taxa though) and leave the corresponding other 4 genes blank for the assumed outgroup? That is, the 7 genes of outgroup 1: A, C, E, blank, blank, blank, blank; outgroup 2: B, D, F, blank, blank, blank, blank.


ADD REPLYlink written 6.5 years ago by Yongjie Zhang80

You definitely do not want to make artificial or composite taxa. You can try doing a concatenated analysis with missing data in your outgroup taxa. I'm not sure how good the results will be though but it is probably worth a shot.

ADD REPLYlink written 6.5 years ago by DG7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour