Question: ClustalW nucleotide alignment with a custom scoring matrix (possible bug?)
4
gravatar for evanfloden
3.7 years ago by
evanfloden190
Centre for Genomic Regulation, Barcelona
evanfloden190 wrote:

For the last week I have been attempting to test ClustalW to confirm that the software is indeed working correctly with a custom nucleotide scoring matrix. Consider the following as a simple test, I wish to align the two nucleotide sequences using the example given here by Wikipedia for Needleman-Wunch alignment. To do this I planned to use the profile alignment option of ClustalW which should perform the standard dynamic programming alignment between two sequences or profiles (not full multiple sequence alignment, just a single pairwise profile alignment)

This problem was previously pointed out Clustalw Ignoring Custom Weight Matrix on BioStars however it was never addressed.

The Wikipedia Example:

Align the following two sequences: GCATGCU & GATTACA;

With the following rules: Mismatch = -1; Match= + 1; Gap = -1;

This alignment results in 3 co-optimal solutions. To test if ClustalW performs as expected I created the three files:

 

SeqA.fa

>SeqA

GCATGCU

 

SeqB.fa

>SeqB

GATTACA

 

matrix.txt

   A   G   C   T   U   *
A  1   -1  -1  -1  -1  -1
G  -1  1   -1  -1  -1  -1
C  -1  -1  1   -1  -1  -1
T  -1  -1  -1  1   -1  -1
U  -1  -1  -1  -1  1   -1
*  -1  -1  -1  -1  -1  -1

 

Now running the command:

clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -DNAMATRIX=matrix.txt -TYPE=DNA

By examining the program via debug flags it becomes apparent that the scoring scheme being used is the IUB scheme (1.9 for match, 0 for mismatch). One can get the exact same results and scores (add -DEBUG=5 to the command) using the command:

clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -DNAMATRIX=IUB -TYPE=DNA

or even removing the matrix completely as in:

clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -TYPE=DNA

So it appears that ClustalW is completely ignoring a custom nucleotide matrix and not throwing any errors. I have tested multiple different ways to try get the program to score the alignment correctly but have been unable to (different formats of the substitution matrices, putting the sequences in one file, trying to perform the alignment as protein sequences etc). It is obviously reading the nucleotide matrix as it will throw an error if it does not exist or if it is in the wrong format, it just seems to ignore it completely when it actually performs the alignment.

If anyone has any suggestions in would be much appreciated as I am starting to bang my head against a wall. Also, if anyone can suggest a program which can perform pairwise profile alignments using standard dynamic programming it would also be useful for testing.

 

 

 

bug clustalw alignment • 1.5k views
ADD COMMENTlink modified 3.6 years ago • written 3.7 years ago by evanfloden190
1
gravatar for evanfloden
3.6 years ago by
evanfloden190
Centre for Genomic Regulation, Barcelona
evanfloden190 wrote:

After several hours of trial and error I discovered that there is a flag regarding transition weightings that overrides the custom matrix.

 

I believe that -TRANSWEIGHT=f must be set to 0 otherwise the custom matrix is ignored. Slightly obtuse use of flag in my opinion.

ADD COMMENTlink written 3.6 years ago by evanfloden190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 729 users visited in the last hour