Question: Please could someone explain multiple sequence aligning?
4.7 years ago
I need to report my findings however I am not 100% how one goes about doing this. I have used Clustal Omega, T Coffee and MUSCLE. After inputting my sequences how do you read the results? Some of my sequences are thousands of bp or aa long and obviously adding this into a document is't exactly clear or particularly useful. What do I get out of aligning these sequences? I understand that you can infer common ancestors but I don't understand how to interpret the results to show that

Also, what is the difference between these applications and how do you know which one is the best?!

4.7 years ago
A multiple sequence alignment algorithm takes a series of parameters that will contribute to an alignment score, i.e., a way of choosing the best alignment as the algorithm proceeds. One way to evaluate how good the alignment is would be to change some of them, like the mismatch penalty or the gap opening/extension penalty, and see whether or not your results change in a meaningful way.

What you get out is, literally, an alignment - an inference of homology - where characters are aligned with putatively homologous characters. Most analyses require that characters be identified as such, especially in phylogenetics and molecular evolution. If you know one of your sequences are ancestral, you can make inferences about derived character states. Imagine you have two aligned sequences:


If you know that the 'AAAAA' sequence is ancestral, then you can say that there are four derived Gs and a deletion. If you have regions that are difficult to align, it might make sense to be conservative remove or mask them (see Gblocks).

There are several applications to visualize your alignments: Geneious, Se-Al, Jalview, etc.

I prefer to use Muscle and Mafft, depending on my needs. There are papers comparing the efficacy of different alignment algorithms you can look at.

Thank you for your answer!

So if you had an alignment that was 16,789 base pairs long and you needed to report this, how would you go about it? Are Identity matrices useful along with the phylogenetic tree spewed out by the programs? I've just tried putting it into MView and it looks more confusing than it did before! I'm never going to understand this!

What exactly do you need to report? You might upload alignments as part of the supplemental data for a manuscript, but not much about the alignment itself is really used. You could use a summary matrix, like pairwise sequence identity, but that tells you more about your samples themselves than the alignment.

"What exactly do you need to report?"
This is exactly the question I asked myself! The instructions I were given was to obtain and align some sequences and then report them. It sounds rather ambiguous to me!

I've managed to format the alignments to look somewhat clear and I've put in the identity matrix. I'm going to try and interpret it all now and then compare the different programs.

