Low bootstrap phylogenetic trees
2
0
6.1 years ago
asitsur • 0

Hi,

I am trying to generate a phylogenetic tree of a certain enzyme (~160aa) which highly conserved throughout Eukarya. I am especially interested in its evolution in Metazoa. My problem is that no matter what I do I always end up with trees that are supported by low bootstrap values.

I work as follows: I collected 31aa seq from 31 species representing Bilateria (mostly Lophotrochozoa and Ecdysozoa but also Deuterostomia), Cnidaria, Ctenophora, Porifera and Placozoa. I have rooted the tree with the choanoflagellet Monosiga brevicollis.

I use MAFFT-linsi for alignment. I usually also use GUIDANCE with several cutoff values to eliminate unreliable columns.

I generate the tree using RAxML gui with the following settings:

ML + slow bootstrap, 1000 runs, 1000 replications, Protgamma LG (or JTT as protein model) + empirical frequencies.

Any idea what I can change/add in order to increase the bootstrap support of the tree? I am at a loss...

alignment phylogeny
1
You could look at whether your enzyme is represented in TreeFam and try adding your sequences to the corresponding family/tree and/or try TreeBest (or its Ensembl Compara variant) with your sequences.

1
How many columns do you have after removing the unreliable columns?

You could also try Gblocks and trimAl for removing poorly aligned columns.

Perhaps, you could try ProtTest for model selection.

EDIT: How do you identify/collect the orthologs in 31 species? This is very important.

0
Any particular reason for not using -m PROTGAMMAAUTO in RAxML? I don't know about empirical frequencies with so few sequences in the msa. Have you curated your alignment manually? Have you tried any other alignment algorithms?

2
6.1 years ago

If it is highly conserved you might simply don't have enough signal... First why you only include 31aa of the total of ~160aa?

If the aa sequence is highly conserved it would help to use the corresponding DNA sequences (aligned using the amino-acid alignment as a template) if available. These might provide additional signal as long as there is not mutational saturation of synonymous sites which is possible considering that you are working with very distant taxa. However I think is worthwhile to try.

0
6.1 years ago
confusedious ▴ 420

If you'd like to amplify the signal and reduce the noise for the data you are looking at, consider using a clique analysis like was used by Gupta & Sneath (2007) in 'Application of the Character Compatibility Approach to Generalized Molecular Sequence Data: Branching Order of the Proteobacterial Subdivisions' (DOI: 10.1007/s00239-006-0082-2).

The largest clique of compatible sites might not be very large for your data, but you won't know until you try.