Question: Low Support Value Of Phylogenetic Trees
9.3 years ago
United States
Dejian

The support value of my phylogenetic tree is very low. What are the possible reasons? How to improve the stability of the tree? (The peptide sequences were retrieved from the gene set of different insect genomes.)

phylogenetics statistics • 7.4k views
I appreciate to know how to read and interpret-ate the phylogenetic tree for viruses?

What does the numbers for bootstraps mean and how interpreted?



The bootstrap values signify how sure one can be that the tree is correct at any given node. A tree is calculated by looking at existing molecular data, and the more data one can gather, the easier it will be to relate the sequences in a highly-confident phylogeny. If the sequence evidence is not good enough, the bootstrap values will be low.

9.3 years ago
Stefano Berri4.2k
Cambridge, UK
Stefano Berri

Tell us more. How do you produce the tree?

In general, you should (1) get the protein sequences*, (1a) remove pseudogenes if you can, (2) align them (clustalX), (3) manually select conserved region(s), (4) iterate between points (2) and (3) till it is stable and then (5) bootstrap, (6) produce trees and (7) consensus.

I suspect you might have skipped step (3) and (4). If, after doing so, you still have low bootstrap values, I suspect you'll have some branches with good support but lower values in the "high branches". Focus on clusters of well conserved genes, remove them, cluster them alone and re-cluster the other with only one or two genes from each group.

Phyolgenetic treee are a bit of a craft...

  • as lh3 says in the comment below, you could also use DNA. DNA could improve only if the protein sequences are all VERY similar (like 95% or more identities).
Using "the protein sequences" is not always the right thing to do. It depends. See also this question.

There are programs automatically trimming poorly aligned regions. There was a question about this. I could not find it right now.

@lh3 trimal or gBlock can be used for trimming the alignment. I recommend trimal.

9.3 years ago
IIMCB, Poland
Leszek

what is you gene of interest, or family of interest? if you are dealing with completely sequenced species, you can try to browse phylomeDB and Ensembl for Metazoa for you genes of interest. Both offer good quality of reconstructed alignments and trees.

9.3 years ago
Newcastle, UK
Tancata

Another question is how much data do you have (i.e. how many well-aligned positions?). If it's e.g. a single-gene alignment, it might not contain much information about the poorly-supported branches and you wouldn't have really been doing anything wrong. Poorly-aligned (or very variable) sequences, or sequences that are almost identical, might also not "be able" to resolve many of the branches on the tree. Or the sequences might be evolving in an unusual way.

9.2 years ago
San Diego, CA, USA
Cmzmasek

Sometimes it helps to remove the most divergent sequences. Also the quality of the MSA is of uttermost importance, I recommend to use MAFFT or ProbCons to calculate it.

