Question: needleman-wunsch (ggsearch implementation) not symmetric
0
gravatar for comacke
3.5 years ago by
comacke10
United States
comacke10 wrote:

I am aligning two sequences using ggsearch, an implementation of Needleman-Wunsch that is part of the FASTA package. The alignments are not the same when the query and subject are switched. From my understanding of the basic Needleman-Wunsch algorithm, the results should be symmetric. Below are the two sequences followed by the alignments.

>3hea:B
STFVAKDGTQIYFKDWGSGKPVLFSHGWPLDADMWEYQMEYLSSRGYRTIAFDRRGFGRSDQPWTGNDYDTFADDI
AQLIEHLDLKEVTLVGFSMGGGDVARYIARHGSARVAGLVLLGAVTPIFGQKPDYPQGVPLDVFARFKTELLKDRA
QFISDFNAPFYGINKGQVVSQGVQTQTLQIALLASLKATVDCVTAFAETDFRPDMAKIDVPTLVIHGDGDQIVPFE
TTGKVAAELIKGAELKVYKDAPHGFAVTHAQQLNEDLLAFLKR

>1hkh:B
GYITVGNENSTPIELYYEDQGSGQPVVLIHGYPLDGHSWERQTRELLAQGYRVITYDRRGFGGSSKVNTGYDYDTF
AADLHTVLETLDLRDVVLVGFSMGTGELARYVARYGHERVAKLAFLASLEPFLVQRDDNPEGVPQEVFDGIEAAAK
GDRFAWFTDFYKNFYNLDENLGSRISEQAVTGSWNVAIGSAPVAAYAVVPAWIEDFRSDVEAVRAAGKPTLILHGT
KDNILPIDATARRFHQAVPEADYVEVEGAPHGLLWTHADEVNAALKTFLAK

This is the alignment when 1hkh:B is the query and 3hea:B is the subject.

>>3hea:B                                            (271 aa)
 n-w opt: 497  Z-score: 399.8  bits: 81.8 E(1): 2.4e-268
global/global (N-W) score: 497; 39.6% identity (69.3% similar) in 280 aa overlap 

               10        20        30        40        50        60
1hkh:B GYITVGNENSTPIELYYEDQGSGQPVVLIHGYPLDGHSWERQTRELLAQGYRVITYDRRG
       .  :   ...:  ..:..: :::.::.. ::.:::.  :: : . : ..:::.:..::::
3hea:B S--TFVAKDGT--QIYFKDWGSGKPVLFSHGWPLDADMWEYQMEYLSSRGYRTIAFDRRG
                   10        20        30        40        50      

               70        80        90       100       110       120
1hkh:B FGGSSKVNTGYDYDTFAADLHTVLETLDLRDVVLVGFSMGTGELARYVARYGHERVAKLA
       :: :..  :: :::::: :.  ..: :::..:.::::::: :..:::.::.:  ::: :.
3hea:B FGRSDQPWTGNDYDTFADDIAQLIEHLDLKEVTLVGFSMGGGDVARYIARHGSARVAGLV
         60        70        80        90       100       110      

              130       140       150       160       170       180
1hkh:B FLASLEPFLVQRDDNPEGVPQEVFDGIEAAAKGDRFAWFTDFYKNFYNLDENLGSRISEQ
       .:... :.. :. : :.::: .::  ...    ::  ...::   ::..  : :. .:.
3hea:B LLGAVTPIFGQKPDYPQGVPLDVFARFKTELLKDRAQFISDFNAPFYGI--NKGQVVSQG
        120       130       140       150       160         170    

              190       200        210       220       230         
1hkh:B AVTGSWNVAIGSAPVAAYAVVPAWIE-DFRSDVEAVRAAGKPTLILHGTKDNILPIDATA
       . : . ..:. ..  :.   : :. : ::: :.  . .   :::..::  :.:.:...:.
3hea:B VQTQTLQIALLASLKATVDCVTAFAETDFRPDMAKIDV---PTLVIHGDGDQIVPFETTG
          180       190       200       210          220       230

     240       250       260       270         
1hkh:B RRFHQAVPEADYVEVEGAPHGLLWTHADEVNAALKTFLAK
       .   . .  :.    . ::::.  :::...:  : .:: .
3hea:B KVAAELIKGAELKVYKDAPHGFAVTHAQQLNEDLLAFLKR
             240       250       260       270

 

This the alignment when the query and subject are switched.

 

>>1hkh:B                                            (279 aa)
n-w opt: 497  Z-score: 381.8  bits: 78.4 E(1): 1.1e-241
global/global (N-W) score: 497; 38.9% identity (69.3% similar) in 280 aa overlap 

                   10        20        30        40        50     
3hea:B STFVAKDGT----QIYFKDWGSGKPVLFSHGWPLDADMWEYQMEYLSSRGYRTIAFDRRG
       . ... . .    ..:..: :::.::.. ::.:::.  :: : . : ..:::.:..::::
1hkh:B GYITVGNENSTPIELYYEDQGSGQPVVLIHGYPLDGHSWERQTRELLAQGYRVITYDRRG
               10        20        30        40        50        60

         60        70        80        90       100       110     
3hea:B FGRSDQPWTGNDYDTFADDIAQLIEHLDLKEVTLVGFSMGGGDVARYIARHGSARVAGLV
       :: :..  :: :::::: :.  ..: :::..:.::::::: :..:::.::.:  ::: :.
1hkh:B FGGSSKVNTGYDYDTFAADLHTVLETLDLRDVVLVGFSMGTGELARYVARYGHERVAKLA
               70        80        90       100       110       120

        120       130       140       150       160         170   
3hea:B LLGAVTPIFGQKPDYPQGVPLDVFARFKTELLKDRAQFISDFNAPFYGI--NKGQVVSQG
       .:... :.. :. : :.::: .::  ...    ::  ...::   ::..  : :. .:.
1hkh:B FLASLEPFLVQRDDNPEGVPQEVFDGIEAAAKGDRFAWFTDFYKNFYNLDENLGSRISEQ
              130       140       150       160       170       180

          180       190       200       210          220       230
3hea:B VQTQTLQIALLASLKATVDCVTAFAETDFRPDMAKIDV---PTLVIHGDGDQIVPFETTG
       . : . ..:. ..  :.   : :. : ::: :.  . .   :::..::  :.:.:...:.
1hkh:B AVTGSWNVAIGSAPVAAYAVVPAWIE-DFRSDVEAVRAAGKPTLILHGTKDNILPIDATA
              190       200        210       220       230         

             240       250       260       270
3hea:B KVAAELIKGAELKVYKDAPHGFAVTHAQQLNEDLLAFLKR
       .   . .  :.    . ::::.  :::...:  : .:: .
1hkh:B RRFHQAVPEADYVEVEGAPHGLLWTHADEVNAALKTFLAK
     240       250       260       270         

 

When I use the Needleman-Wunsch alignment tool on the PDB website I get symmetric behavior. Using 1hkh:B as the query, these are the results from the PDB website (the page may take a second to load).

http://www.rcsb.org/pdb/workbench/workbench.do?action=pw_needlemanwunsch&mol=1hkh.B&mol=3hea.B

Using 3hea:B as the query, these are the results 

http://www.rcsb.org/pdb/workbench/workbench.do?action=pw_needlemanwunsch&mol=3hea.A&mol=1hkh.B

 

Maybe the non-symmetric behavior is due to some feature of the ggsearch implementation? Or perhaps there's an option, to change this behavior? Here's the command I used to run ggsearch.

ggsearch36 path/to/query.fa path/to/subject.fa -s BP62 -E 10 -F 0

From my understanding -s BP62 sets the scoring matrix to Blosum62 and the gap penalties to -11/-1, the same as in BLASTP.

-E sets the maximum evalue and -F sets the minimum evalue, which doesn't matter too much here since I am only aligning two sequences that I know are fairly close.

 

ADD COMMENTlink modified 3.5 years ago by Christian2.7k • written 3.5 years ago by comacke10
1

Barring a bug, the alignment is not symmetric if the substitution matrix is not symmetric. Are you sure that BP62 refers to the BLOSUM62 matrix ? On the FASTA web site page for ggsearch, there's no BLOSUM62 but a BlastP62 matrix.
 

ADD REPLYlink written 3.5 years ago by Jean-Karim Heriche18k

In the FASTA package manual (http://faculty.virginia.edu/wrpearson/fasta/fasta_guide.pdf page 16) it says

-s BP62 sets the scoring matrix to BLOSUM62 and the gap penalties to -11/-1, identical to BLASTP

ADD REPLYlink written 3.5 years ago by comacke10
1

Then Christian spotted the most likely explanation. It's possible that PDB picked the same solution twice from the possible ones just by chance. It could also be that PDB's implementation finds all optimal alignments then sorts them by % identity and/or other criterion (on PDB's site, the alignment returned looks like the one with the highest % identity from the two you got with ggsearch).

ADD REPLYlink written 3.5 years ago by Jean-Karim Heriche18k
5
gravatar for Christian
3.5 years ago by
Christian2.7k
Cambridge, US
Christian2.7k wrote:
The two alignments have the same score, so there are multiple optimal solutions.
ADD COMMENTlink written 3.5 years ago by Christian2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1897 users visited in the last hour