Multi-Loci Dna Sequence Based Phylogenic Tree
5
7
Entering edit mode
11.2 years ago

Hi,

I have a few loci (26) for which I have the corresponding sequence (same lenght, aligned, without gaps) in seven species. I would like to make a phylogenic tree based on all theses sequences. It seems pretty obvious to do such a tree for only one locus, but I am unfamiliar with phylogenetic trees and I have no clue how to construct a tree based on multi loci information like I have. I suspect there may be many approaches or statistical models.

Do you have any experience related to program or approach would be best to tackle this problem?

Many thanks!

EDIT: Someone in the lab just suggested Stem, Beast and Best. Any experience with those?

phylogenetics sequence tree • 8.5k views
ADD COMMENT
5
Entering edit mode
11.2 years ago
Jpromvi ▴ 50

Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree.

The following article describes something that could be similar to what you are trying to do, and has a lot of references on the methods they use. I hope it helps:

Fitzpatrick, Logue, Stajich and Butler. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evolutionary Biology 2006, 6:99.

ADD COMMENT
0
Entering edit mode

Hi,

Following your answer, I'm wondering about the mutation model of each gene. Do you mean that you assume that the genes have the same mutation model when you concatenate them into a super-gene alignment?

Cheers!

ADD REPLY
5
Entering edit mode
11.2 years ago
David W 4.8k

Hi Eric,

This is just about master-mind specialty topic (much more so than the coding business anyway ;), jpromvi's answer is pretty much covers it, but I thought I'd add some more details.

You basic choices are

  1. One big alignment (usually with different mutation models for each gene)
  2. Estimate a 'gene tree' for each locus and co-estimate a species tree

Number one will likely be a lot quicker, but it can run into all sorts of trouble (because a few strong signals from genes that don't represent the organismal relationships can override the species tree).

Which use will depend on your question (you might not even care about the species-level relationships) and perhaps the shape of your tree (short internal branches increase the probability of incomplete lineage sorting, one of the sources of error).

If you go down the gene-tree route then the three choices you mention, BEAST, BEST and STEM are the main options. STEM relies on trees estimated by some other method (so you reconcile your gene trees with a species designation) it doesn't work for my question because you need branch lengths in "coalsencent units" (generations/effective population size) and don't have data to reliable estimate that (so I don't know much about it).

BEAST and BEST take alignments, and do much the same thing. I've found the default priors in BEST make it almost impossible for the MCMC runs to converge (especially the population size prior) and BEAST runs a good deal faster. (The species tree thing isn't well documented in BEAST, but you just need to import species names as a "trait" in BEAUti, the gui BEAST file generator).

There, have I managed to thoroughly confuse you?

ADD COMMENT
0
Entering edit mode

Also MrBayes would work. If doing it off one big alignment, the key think is that you'd want to estimate and set your model of molecular evolution (and possibly other params) separately for each gene/data partition.

ADD REPLY
2
Entering edit mode
11.2 years ago
Dave Lunt ★ 2.0k

I agree with most written above, but since you say "I am unfamiliar with phylogenetic trees" maybe you would like some simplifications and direct advice. Firstly, don't do a tree for each locus and try to put them together, just concatenate your sequences for each species into one big alignment and build a single tree. There are lots of ways to concatenate- the simplest being copy and paste into one big fasta file. I like to use FasConcat which is easy and flexible and prevents screwups. Then the easiest/ fastest place to build a high-quality maximum likelihood tree is RAxML. Just remember to check the button that says "Maximum likelihood search" on the web form or else it won't construct you the tree you want. MrBayes, BEAST etc are fine programs, but probably not as good as RAxML for your purposes, and will increase your stress levels unnecessarily as you try to get used to them. There are lots of sophistcations to phylogenetic analysis but you should be able to get a very good quality tree without too much pain with FasConcat and RAxML. Ask again if you need help with fancy things after you have that tree.

ADD COMMENT
1
Entering edit mode
11.2 years ago

You can also use codeml from the PAML package. It's got a model for the 3 codon positions if your alignments are CDS backtranslatios of protein coding genes.

ADD COMMENT
1
Entering edit mode
11.2 years ago

Heard about this tool pplacer yesterday. If you have a reference alignment, you may use pplacer to get the tree. PS. I have not tried this myself.

ADD COMMENT
0
Entering edit mode

pplacer performs phylogenetic placement of a sequence on a tree. I don't think it was the question ?

ADD REPLY

Login before adding your answer.

Traffic: 2104 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6