Low bootstrap phylogenetic trees
2
0
Entering edit mode
6.1 years ago
asitsur • 0

Hi,

I am trying to generate a phylogenetic tree of a certain enzyme (~160aa) which highly conserved throughout Eukarya. I am especially interested in its evolution in Metazoa. My problem is that no matter what I do I always end up with trees that are supported by low bootstrap values.

I work as follows: I collected 31aa seq from 31 species representing Bilateria (mostly Lophotrochozoa and Ecdysozoa but also Deuterostomia), Cnidaria, Ctenophora, Porifera and Placozoa. I have rooted the tree with the choanoflagellet Monosiga brevicollis.

I use MAFFT-linsi for alignment. I usually also use GUIDANCE with several cutoff values to eliminate unreliable columns.

I generate the tree using RAxML gui with the following settings:

ML + slow bootstrap, 1000 runs, 1000 replications, Protgamma LG (or JTT as protein model) + empirical frequencies.

Any idea what I can change/add in order to increase the bootstrap support of the tree? I am at a loss...

alignment phylogeny • 4.7k views
ADD COMMENT
1
Entering edit mode

You could look at whether your enzyme is represented in TreeFam and try adding your sequences to the corresponding family/tree and/or try TreeBest (or its Ensembl Compara variant) with your sequences.

ADD REPLY
1
Entering edit mode

How many columns do you have after removing the unreliable columns?

You could also try Gblocks and trimAl for removing poorly aligned columns.

Perhaps, you could try ProtTest for model selection.

EDIT: How do you identify/collect the orthologs in 31 species? This is very important.

ADD REPLY
0
Entering edit mode

Any particular reason for not using -m PROTGAMMAAUTO in RAxML? I don't know about empirical frequencies with so few sequences in the msa. Have you curated your alignment manually? Have you tried any other alignment algorithms?

ADD REPLY
2
Entering edit mode
6.1 years ago

If it is highly conserved you might simply don't have enough signal... First why you only include 31aa of the total of ~160aa?

If the aa sequence is highly conserved it would help to use the corresponding DNA sequences (aligned using the amino-acid alignment as a template) if available. These might provide additional signal as long as there is not mutational saturation of synonymous sites which is possible considering that you are working with very distant taxa. However I think is worthwhile to try. 

ADD COMMENT
0
Entering edit mode
6.1 years ago
confusedious ▴ 420

If you'd like to amplify the signal and reduce the noise for the data you are looking at, consider using a clique analysis like was used by Gupta & Sneath (2007) in 'Application of the Character Compatibility Approach to Generalized Molecular Sequence Data: Branching Order of the Proteobacterial Subdivisions' (DOI: 10.1007/s00239-006-0082-2).

The largest clique of compatible sites might not be very large for your data, but you won't know until you try.

 

ADD COMMENT

Login before adding your answer.

Traffic: 2201 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6