Question

Similarity Score Of Multiple Sequence Alignment

3

Entering edit mode

14.5 years ago

Ananth ▴ 70

Hello,

I have a file with protein sequences for which I would like to know the similarity score of the multiple sequence alignment.

I have aligned these sequences using ClustalW, but all I get is the pairwise identity score !

I am not looking for the pairwise identity or similarity score, but the similarity score of the multiple sequence alignment, so that I can conclude that "this group of sequences are x% similar with each other".

Is there any tool that gives a measure of similarity of the sequences ? Or any method for calculating this score ?

Please help !

Thank you, Ananth

multiple • 27k views

ADD COMMENT • link updated 14.5 years ago by User 0431 ▴ 60 • written 14.5 years ago by Ananth ▴ 70

1

Entering edit mode

the similarity score depends on the substitution matrix used. So you should never say "this group of sequences are x% similar with each other" but rather "this group of sequences are x% similar with each other given this specific substitution matrix". Moreover, check you are doing a global alignment and not a local one.

ADD REPLY • link 14.5 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Thank you Giovanni,

As you correctly pointed out, yes for a specific substitution matrix in a global alignment is there a way to calculate this similarity score for a MSA ?!

ADD REPLY • link 14.5 years ago by Ananth ▴ 70

0

Entering edit mode

how can run the MstatX.

I have try this command but not working

Mstat -m test.fa -s trident

could u please give me example for command

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 14.0 years ago by User 0431 ▴ 60

0

Entering edit mode

./mstatx -i test.fa -s trident -g

ADD REPLY • link 5.8 years ago by oshadhisamarasinghe • 0

Ram · Answer 1 · 2011-01-24

8

Entering edit mode

14.5 years ago

Bilouweb ★ 1.1k

I have made a tool to derive statistics from a multiple alignment. It gives a score for each column of the multiple alignment given a substitution matrix. Here is the link (github) : MstatX. (use the -s trident option)

Hope it can help. If you need any help, just ask.

EDIT : The question "How to measure the conservation (or similarity) in a multiple alignment is quite difficult as it is discussed in these questions : Conservation Score Of Amino Acid Positions In Human Proteins and Entropy From A Multiple Sequence Alignment With Gaps

A first measure can be calculated by the following algorithm (the famous sum of pairs):

Msa msa;
float total = 0.0;
for (c = 0; c < nb_column; ++c) {
  float sum = 0.0;
  for (r = 0; r < nb_row - 1; ++r){
    for (s = r + 1; s < nb_row; ++s){
      sum += similarity_score(msa[c][r],msa[c][s]);
    }
  }
  total += sum / (nb_row *(nb_row -1) / 2);
}
total /= nb_column;

Where the similarity_score is your scoring matrix.

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 14.5 years ago by Bilouweb ★ 1.1k

1

Entering edit mode

I added two links to relative questions.

ADD REPLY • link 14.5 years ago by Bilouweb ★ 1.1k

1

Entering edit mode

Thanks bilouweb ! It was helpful. :)

ADD REPLY • link 14.5 years ago by Ananth ▴ 70

1

Entering edit mode

Is it possible for MstatX to output a final MSA score?

ADD REPLY • link 14.3 years ago by Lee Katz ★ 3.2k

1

Entering edit mode

Is it possible for MstatX to output a final MSA score? When I ran it, I could only find ways to output per-column scores. Thank you for the software package!

ADD REPLY • link 14.3 years ago by Lee Katz ★ 3.2k

1

Entering edit mode

Thanks for using MstatX ! I can add a total score as a mean of all scores. I will also add a DNA matrix for multiple alignments of dna.

ADD REPLY • link 14.3 years ago by Bilouweb ★ 1.1k

1

Entering edit mode

Thanks! I think it would be helpful to have a total score too, similar to the one that Clustal or MUSCLE would output.

ADD REPLY • link 14.3 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

what is the difference between wentropy and trident statistics?

ADD REPLY • link 5.8 years ago by oshadhisamarasinghe • 0

score 6 · Answer 2 · 2011-01-24

I think the answer is "no". The reason is that I cannot think of a meaningful way to define the % identity of a multiple sequence alignment.

If one defines it as as the fraction of aligned positions that are identical across all sequences, the % identity would automatically be lower the more sequences you have in the alignment. It would thus not be comparable between different alignments.

score 1 · Answer 3 · 2011-01-24

1

Entering edit mode

14.5 years ago

Alastair Kerr 5.3k

Depending on what you mean by 'measure of similarity'. PAM value if a protein alignment? Global %identity?

Look at Sean Eddy's tools. alistat, (build from the SQUID package) might meet your needs. It is also installed as part of the HMMER package.

ADD COMMENT • link 14.5 years ago by Alastair Kerr 5.3k

0

Entering edit mode

Thank you Alastair

As for a pairwise sequence alignment ClustalW indicates the sequence identity by a score which shows the percentage identity shared between the 2 sequences.

By the measure of similarity what I meant was, instead having a score that is for 2 sequences, can we have a score that gives an idea of similarity of the multiple sequence alignment ?

ADD REPLY • link 14.5 years ago by Ananth ▴ 70