How to judge MSA from tree?
Entering edit mode
3.0 years ago

Hi there,

in my current odessey through phylogeny and multiple sequence alignments I came across another question I hope to be answered.

I did MSA with different algorithms from MAFFT, MUSCLE and compared the trees calculated from these alignments with the tree calculated from the original Pfam alignment (I wanted to try ProbCons as well, but neither their web interface nor the local install worked for me). Now, the overall tree structure looks similar, but rearrangements of subgroups/clades are present.

However, the trees calculated from MAFFT with FFT-NS-i, L-NS-i and Pfam look identical. Those from MUSCLE and MAFFT with E-NS-i are also nearly identical to each other. MAFFT with global alignment approach G-NS-i looks different from all others. Basically I used six different algorithms and ended up with tree different trees, with three alignments resulting in the same tree, two in the another same and a third from one.

Now, since I'm just a interested microbiologist for me the differences are hard to interpret, although I expected to see some small difference. Is one alignment more 'true' than the other, when different algorithms result in the same tree?

alignment phylogeny MAFFT MUSCLE Pfam • 803 views
Entering edit mode

There are lots of additional factors you haven’t mentioned. What kinds of sequences are they? (Protein or DNA).

The algorithms are not necessarily equivalent. A neighbour joining tree and a maximum likelihood tree will very often find different topologies for example, even if the starting alignments are the same. There are 2 layers to consider: the alignment algorithm and the tree creation algorithm.

You don’t have a “ground truth” so it’s not realistic to say which is the most ‘real’.

Entering edit mode

Sorry, I aligned protein sequences.

That different algorithms produce different alignments was my expectation. I just wonder, which alignment to choose as base for a ML tree. Since often it is said, that one should compare different alignment algorithms, before building a tree. But besides different benchmarks, I didn't find a way to judge on own real data. Or is applicable to just stick to one preferred algorithm, since their is no 'ground truth'?

Entering edit mode

Deciding which alignments are best is still quite subjective. You might be most interested in an alignment which preserves an active site best or something which is relevant to the overall question.

It’s not uncommon for people to edit an alignment manually by eye though because humans are still quite good at the task.

Some tools are better with protein sequences than others. I believe Clustal Omega is one of the best performing for protein for instance.


Login before adding your answer.

Traffic: 1577 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6