Does anyone have a script to determine the score of a multiple sequence alignment? Hopefully using BioPerl?
Does anyone have a script to determine the score of a multiple sequence alignment? Hopefully using BioPerl?
Per Alastair's comment, I tried MstatX, at https://github.com/gcollet/MstatX.
The program is easy to use and has several ways of calculating the score. However, it bothers me a little that it doesn't output a final score to the termal. It prints a score for each column into a file, which I was able to sum up. It also bothers me that it doesn't come packaged with a simple matrix for DNA and so it is only optimized for protein. I quickly made up a DNA matrix on the spot which may not be technically correct.
tar zxvf gcollet-MstatX-31481c6.tar.gz
cd gcollet-MstatX-31481c6
make
./mstatx -ma path/to/file -b -sp data/dna.mat
perl -e 'while(<>){$score+=$_;}print "$score\n";' < file.cons
The file I made (probably isn't the most correct thing I could have made)
H DNA matrix
D DNA matrix by Lee Katz
R LIT:1902106 PMID:1438297
A Henikoff, S. and Henikoff, J.G.
T Amino acid substitution matrices from protein blocks
J Proc. Natl. Acad. Sci. USA 89, 10915-10919 (1992)
* matrix in 1/3 Bit Units
M rows = ATCGN-, cols = ATCGN-
2.
-1. 2.
-1. -1. 2.
-1. -1. -1. 2.
-1. -1. -1. -1. -2.
-2. -2. -2. -2. -2. -2.
//
muscle3.8 has a 'spscore' option which computes an SP objective score for a multiple sequence alignment. e.g. path/to/muscle -spscore file_name
e.g. to extract just the score into a variable (psuedocode):
Compute SP score with muscle (e.g. path/to/muscle -spscore file_name -log <log_file>)
Read log file
Iterate through each line of file
If line contains string 'SP=' (perl e.g. /SP=/)
match 'SP=' (perl e.g. =~ /SP=/)
Print segment after match (perl e.g. print $')
NB: You could even extract the matching line with unix 'grep' Download muscle from: http://www.drive5.com/muscle/
"The Father. The Son. The Holy Spirit. And the South African National Bioinformatics Institute."
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Look at this thread: Similarity Score Of Multiple Sequence Alignment
I almost thought that someone had asked this until I went into the question, but it looks like it hasn't come up yet. Pariwise Local And Global Alignment