Question: Similarity Score Of Multiple Sequence Alignment
3
gravatar for Ananth
8.7 years ago by
Ananth70
Ananth70 wrote:

Hello,

I have a file with protein sequences for which I would like to know the similarity score of the multiple sequence alignment.

I have aligned these sequences using ClustalW, but all I get is the pairwise identity score !

I am not looking for the pairwise identity or similarity score, but the similarity score of the multiple sequence alignment, so that I can conclude that "this group of sequences are x% similar with each other".

Is there any tool that gives a measure of similarity of the sequences ? Or any method for calculating this score ?

Please help !

Thank you, Ananth

multiple • 16k views
ADD COMMENTlink written 8.7 years ago by Ananth70
1

the similarity score depends on the substitution matrix used. So you should never say "this group of sequences are x% similar with each other" but rather "this group of sequences are x% similar with each other given this specific substitution matrix". Moreover, check you are doing a global alignment and not a local one.

ADD REPLYlink written 8.7 years ago by Giovanni M Dall'Olio26k

Thank you Giovanni,

As you correctly pointed out, yes for a specific substitution matrix in a global alignment is there a way to calculate this similarity score for a MSA ?!

ADD REPLYlink written 8.7 years ago by Ananth70

how can run the MstatX.

I have try this command but not working

Mstat -m test.fa -s trident

could u please give me example for command

ADD REPLYlink modified 13 days ago by RamRS24k • written 8.2 years ago by User 043150

./mstatx -i test.fa -s trident -g

ADD REPLYlink modified 8 hours ago • written 8 hours ago by oshadhisamarasinghe0
8
gravatar for Bilouweb
8.7 years ago by
Bilouweb1.1k
Saclay, France
Bilouweb1.1k wrote:

I have made a tool to derive statistics from a multiple alignment. It gives a score for each column of the multiple alignment given a substitution matrix. Here is the link (github) : MstatX. (use the -s trident option)

Hope it can help. If you need any help, just ask.

EDIT : The question "How to measure the conservation (or similarity) in a multiple alignment is quite difficult as it is discussed in these questions : Conservation Score Of Amino Acid Positions In Human Proteins and Entropy From A Multiple Sequence Alignment With Gaps

A first measure can be calculated by the following algorithm (the famous sum of pairs):

Msa msa;
float total = 0.0;
for (c = 0; c < nb_column; ++c) {
  float sum = 0.0;
  for (r = 0; r < nb_row - 1; ++r){
    for (s = r + 1; s < nb_row; ++s){
      sum += similarity_score(msa[c][r],msa[c][s]);
    }
  }
  total += sum / (nb_row *(nb_row -1) / 2);
}
total /= nb_column;

Where the similarity_score is your scoring matrix.

ADD COMMENTlink modified 13 days ago by RamRS24k • written 8.7 years ago by Bilouweb1.1k
1

I added two links to relative questions.

ADD REPLYlink written 8.7 years ago by Bilouweb1.1k
1

Thanks bilouweb ! It was helpful. :)

ADD REPLYlink written 8.7 years ago by Ananth70
1

Is it possible for MstatX to output a final MSA score?

ADD REPLYlink written 8.5 years ago by Lee Katz2.9k
1

Is it possible for MstatX to output a final MSA score? When I ran it, I could only find ways to output per-column scores. Thank you for the software package!

ADD REPLYlink written 8.5 years ago by Lee Katz2.9k
1

Thanks for using MstatX ! I can add a total score as a mean of all scores. I will also add a DNA matrix for multiple alignments of dna.

ADD REPLYlink written 8.5 years ago by Bilouweb1.1k
1

Thanks! I think it would be helpful to have a total score too, similar to the one that Clustal or MUSCLE would output.

ADD REPLYlink written 8.5 years ago by Lee Katz2.9k

what is the difference between wentropy and trident statistics?

ADD REPLYlink written 8 hours ago by oshadhisamarasinghe0
6
gravatar for Lars Juhl Jensen
8.7 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

I think the answer is "no". The reason is that I cannot think of a meaningful way to define the % identity of a multiple sequence alignment.

If one defines it as as the fraction of aligned positions that are identical across all sequences, the % identity would automatically be lower the more sequences you have in the alignment. It would thus not be comparable between different alignments.

ADD COMMENTlink written 8.7 years ago by Lars Juhl Jensen11k
1
gravatar for Alastair Kerr
8.7 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

Depending on what you mean by 'measure of similarity'. PAM value if a protein alignment? Global %identity?

Look at Sean Eddy's tools. alistat, (build from the SQUID package) might meet your needs. It is also installed as part of the HMMER package.

ADD COMMENTlink written 8.7 years ago by Alastair Kerr5.2k

Thank you Alastair

As for a pairwise sequence alignment ClustalW indicates the sequence identity by a score which shows the percentage identity shared between the 2 sequences.

By the measure of similarity what I meant was, instead having a score that is for 2 sequences, can we have a score that gives an idea of similarity of the multiple sequence alignment ?

ADD REPLYlink written 8.7 years ago by Ananth70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2044 users visited in the last hour