Question: MSA building algorithm
5.1 years ago by
ga29qal0 wrote:

​Hi guys,

​I have a questions concerning multiple sequence alignments (msa) building algorithms. Which algorithm is "best", to construct a multiple sequence alignment from a bundle of sequences distantly related?

To specify the question a bit more precisely, if I have a bundle of protein sequences, that are annotated to have the same function, but which don't have a similar (midnight or twighlight zone) amino acid sequence, which algorithm works best to align them? (to find sequences that ought to have the same annotation)

​I tried gotoh combined with a center star approach, but this do not seem to be a very good approach. I'm now wondering for an algorithm that in provide "better" multiple sequence alignment than "my" first approach:
  - calculating pairwise alighments for each sequence pair with gotoh
  - find the sequence, for which the sum of pairwise alignments score to all other sequences ​is best.
  - construct a msa by combining the pairwise alignments of this center sequence.

5.1 years ago by
Andreas2.4k wrote:

Hi there,

I don't think there is a 'best' algorithm for this problem. Aligning multiple protein sequences below the 'twilight zone' is notoriously hard. However, some benchmark data-sets try to model exactly this case. For example, have a look at Prefab results in Table 2 here: For protein sequences with low identities MSAProbs seems to be work best at least in that specific setup using Prefab.



Another paper that may be of interest is: "AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis." (PubMed:20530533), which illustrates how the various MSA programs can complement each other.

