How Can I Interpret A Multiple Sequence Alignment?
5
2
Entering edit mode
11.5 years ago
Giselle ▴ 130

hey!

how can i interpret an multiple sequence alignment? i have a fasta file, containig 5 sequences, which are ortholog.

what should look for? how can i find conserved regions? here is an example http://i.imgur.com/zGCHE.png

multiple • 33k views
0
Entering edit mode

Hello,

Has nothing changed in MSA analysis field after almost 4.5 years?

1
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

It seems most appropriate to me to open a separate thread for your question, since there is no direct link with this thread.

3
Entering edit mode
11.5 years ago
D-Horse ▴ 110

Hi,I think you could use this tool:ClustalW,http://www.ebi.ac.uk/Tools/msa/clustalw2/ First submit your sequences,then get the result and click the tab "Result Summary"-"Start Jalview".And you also could see the Tree by clicking "Guide Tree". I hope this will help you.

2
Entering edit mode

A guide-tree is really just used by progressive alignment programs to determine the order of alignment. Not more. See for example one of the Muscle papers http://www.biomedcentral.com/1471-2105/5/113 for an explanation. Guide-trees can be really, really crude sometimes. It's better to compute a real phylogenetic tree once you have multiple alignment.

0
Entering edit mode

Just keep in mind that guide-trees are not evolutionary trees!

Andreas

0
Entering edit mode

I think guide trees are also named phylogenetic trees.Is that not evolutionary trees?What are the differences?

0
Entering edit mode

Alles klar:)Thank you!

3
Entering edit mode
11.5 years ago
Asaf 9.4k

If you want to find conserved regions you can use Rate4site using the tree you got and the MSA to get the evolution rate of each amino-acid. If the distances between the species are even or you're not sure about them you can use plotcon from EMBOSS to plot the conservation using the desired window size and substitution matrix.

2
Entering edit mode
11.3 years ago
scapella ▴ 390

Hi Giselle,

After doing your Multiple Sequence Alignment (MSA) using any of the available problems, you could consider for each position (column) in your alignment that residues (amino-acids) in that column are homologs, that means, they share an common evolutionary history. If you are 100% sure your sequences are orthologs, then the residues in each column are orthologs between then. Depending on the amount of gaps you have in each column, you can consider other evolutionary events as insertions (few residues - a lot of gaps) and deletions (few gaps - a lot of residues). Regarding to the conserved blocks, they are those that look very well aligned (few gaps - few inconsistencies) but if you want to be sure about your conserved blocks, you could try programs such as GBlocks, trimAl or BMGE that detect and remove poorly aligned/misaligned columns in your alignment.

And, of course, if you want to reconstruct a phylogenetic tree, once you have your alignment, use a program devoted for that. I'd recommend you to no use a guide tree as a phylogenetic tree.

2
Entering edit mode
11.3 years ago
Burlappsack ▴ 680

Hello, If you have a ClustalW formatted alignment, you can easily view and edit the results in PFAAT, pfaat.sourceforge.net. Once imported, PFAAT allows you to build a neighbor joining tree(Analysis -> Neighbor Joining Tree), which will help visualize the relationship between sequences. You can also determine the information score for each column, with Analysis -> Conservation -> Information Score, with a couple different scoring schemes and similarity matrixes.

2
Entering edit mode
9.9 years ago

The answer to the question "How can i interpret my multiple sequence alignment?" always depends on the context in which you want to use the MSA.

There are uses of multiple sequences alignments (MSAs) that do not assume that residues in the same column are "homolgous" (a very tricky word, and one I prefer to avoid) i.e. that any differences in the residues in that column are due only to point substitution events. For example, the sequence logos of signal peptides you find on this page

http://www.cbs.dtu.dk/services/SignalP-3.0/background/dataset.php

could be thought of as being based on an (ungapped) MSA of a set of not-necessarily-"homologous" signal peptides, but it makes sense to do this as the results/analysis are being interpreted in a structural/functional rather than an evolutionary context.

Having said that, "relatedness" of the sequences in an MSA is an assumption underlying many (probably most...?) applications of alignments. The example above is given just to highlight the importance of understanding the context in which you want to use the MSA.

It all depends on knowing why you're interested in building the MSA in the first place; it can help some people to do this by thinking about which sequences they don't want in their alignment, and then thinking about/understanding why.

An MSA I love to ask students to interpret is the one in figure 2 of this article in PNAS

http://www.pnas.org/content/106/50/21149.long