I'm attempting to specify a custom match/mismatch matrix for CLUSTALW 2.1 doing a pairwise DNA alignment. The program takes the matrix file and reads it, at least partially. It throws an error if the row and column labels are invalid, but not if, say, a numerical entry in the matrix is replaced with a word or simply removed.
Unsurprisingly, the program seems to be ignoring the content of the matrix. I can't seem to get the matrix to influence either the alignment itself or the scores produced. For example, I have this FASTA (matrix.fa):
>A AC >B T
My matrix (matrix.txt) can be either:
A G C T * A 5 5 5 10 5 G 5 5 5 5 5 C 5 5 5 5 5 T 10 5 5 5 5 * 5 5 5 5 5
A G C T * A 5 5 5 5 5 G 5 5 5 5 5 C 5 5 5 10 5 T 5 5 10 5 5 * 5 5 5 5 5
By varying the matrix, I should be able to make T and A align, or T and C align, depending on whether T/A or T/C has the greater score in the matrix. I cannot get ClustalW to do this; it is as if it is completely ignoring the values in my matrix. It always produces:
CLUSTAL 2.1 Multiple Sequence Alignments Sequence format is Pearson Sequence 1: A 2 bp Sequence 2: B 1 bp Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 0 Guide tree file created: [matrix.dnd] There are 1 groups Start of Multiple Alignment Aligning... Group 1: Delayed Alignment Score -15 CLUSTAL-Alignment file created [matrix.aln] CLUSTAL 2.1 multiple sequence alignment A AC B -T
My command line:
clustalw matrix.fa -align -dnamatrix=matrix.txt && cat matrix.aln
Can you show me an example of how to run ClustalW properly with a custom matrix, such that the matrix actually influences the output? Am I forgetting something important? Is there a spec for "BLAST format" matrices that I may be ignoring?