Question: Construction Of Big Phylogenetic Tree
1
gravatar for sanchezcavani
5.2 years ago by
sanchezcavani210
Canada
sanchezcavani210 wrote:

I am trying to construct a big phylogenetic tree where about 800 OTUs will be involved in. I guess I need to reduce the number of OTUs in the tree by choosing some representative species. However, I am not sure how to do it. Does anyone know about how representative species should be chosen? Or is there any paper about this issue? Thanks a lot!

• 2.3k views
ADD COMMENTlink modified 5.2 years ago by Joseph Hughes2.6k • written 5.2 years ago by sanchezcavani210

Well, sequence type and length as well as your computational resources come into play here. Consider the quality of alignment and tree. Large, diverse sets of species are generally preferable.

ADD REPLYlink written 5.2 years ago by Spitshine630
2
gravatar for Joseph Hughes
5.2 years ago by
Joseph Hughes2.6k
Scotland, UK
Joseph Hughes2.6k wrote:

You can easily generate phylogenies of 800 OTUs using tools such as RAxML. If you really want to reduce your set of OTUs, you will need to choose a threshold of similarity between the sequences and this becomes rather subjective. If you really want to do the latter, then useful tools are CD-HIT and its companion cdhit-cluster-concensus for creating non-redundant sets above your arbitrary threshold.

ADD COMMENTlink written 5.2 years ago by Joseph Hughes2.6k

Yup, 800 OTUs is certainly feasible, especially for single gene phylogenies. More diversity is usually better, although be careful of things like paralogs in your dataset (unless that is something you are intending on looking at). Otherwise, as suggested I would only look at removing sequences that don't really add much in terms of sequence diversity to your dataset.

ADD REPLYlink written 5.2 years ago by Dan Gaston7.1k

Thanks! RaxML will cost more than two weeks which is a still little long. Also I am thinking that the tree figure may not be very clear when using the full species tree.

ADD REPLYlink written 5.2 years ago by sanchezcavani210

There is also RAxML-Lite which will run faster. FastTree2 should also give you an approximation of the ML tree quite rapidly. Assuming groups resolve themselves well in your tree (not guaranteed if it is a single gene phylogeny) you can collapse groups afterwards with a program like FigTree for making final figures. Then include the full tree in your supplementary materials.

As an aside, two weeks isn't a long time for doing a good phylogenetic analysis. Large phylogenies often take weeks to months, even on clusters, to do properly for some analyses. Anything worth doing is worth doing the best possible way.

ADD REPLYlink written 5.1 years ago by Dan Gaston7.1k

Thanks for your helpful suggestions!

ADD REPLYlink written 5.1 years ago by sanchezcavani210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1600 users visited in the last hour