needleman-wunsch (ggsearch implementation) not symmetric
1
0
Entering edit mode
8.6 years ago
comacke ▴ 10

I am aligning two sequences using ggsearch, an implementation of Needleman-Wunsch that is part of the FASTA package. The alignments are not the same when the query and subject are switched. From my understanding of the basic Needleman-Wunsch algorithm, the results should be symmetric. Below are the two sequences followed by the alignments.

>3hea:B
STFVAKDGTQIYFKDWGSGKPVLFSHGWPLDADMWEYQMEYLSSRGYRTIAFDRRGFGRSDQPWTGNDYDTFADDI
AQLIEHLDLKEVTLVGFSMGGGDVARYIARHGSARVAGLVLLGAVTPIFGQKPDYPQGVPLDVFARFKTELLKDRA
QFISDFNAPFYGINKGQVVSQGVQTQTLQIALLASLKATVDCVTAFAETDFRPDMAKIDVPTLVIHGDGDQIVPFE
TTGKVAAELIKGAELKVYKDAPHGFAVTHAQQLNEDLLAFLKR

>1hkh:B
GYITVGNENSTPIELYYEDQGSGQPVVLIHGYPLDGHSWERQTRELLAQGYRVITYDRRGFGGSSKVNTGYDYDTF
AADLHTVLETLDLRDVVLVGFSMGTGELARYVARYGHERVAKLAFLASLEPFLVQRDDNPEGVPQEVFDGIEAAAK
GDRFAWFTDFYKNFYNLDENLGSRISEQAVTGSWNVAIGSAPVAAYAVVPAWIEDFRSDVEAVRAAGKPTLILHGT
KDNILPIDATARRFHQAVPEADYVEVEGAPHGLLWTHADEVNAALKTFLAK

This is the alignment when 1hkh:B is the query and 3hea:B is the subject.

>>3hea:B                                            (271 aa)
 n-w opt: 497  Z-score: 399.8  bits: 81.8 E(1): 2.4e-268
global/global (N-W) score: 497; 39.6% identity (69.3% similar) in 280 aa overlap 

               10        20        30        40        50        60
1hkh:B GYITVGNENSTPIELYYEDQGSGQPVVLIHGYPLDGHSWERQTRELLAQGYRVITYDRRG
       .  :   ...:  ..:..: :::.::.. ::.:::.  :: : . : ..:::.:..::::
3hea:B S--TFVAKDGT--QIYFKDWGSGKPVLFSHGWPLDADMWEYQMEYLSSRGYRTIAFDRRG
                   10        20        30        40        50

               70        80        90       100       110       120
1hkh:B FGGSSKVNTGYDYDTFAADLHTVLETLDLRDVVLVGFSMGTGELARYVARYGHERVAKLA
       :: :..  :: :::::: :.  ..: :::..:.::::::: :..:::.::.:  ::: :.
3hea:B FGRSDQPWTGNDYDTFADDIAQLIEHLDLKEVTLVGFSMGGGDVARYIARHGSARVAGLV
         60        70        80        90       100       110

              130       140       150       160       170       180
1hkh:B FLASLEPFLVQRDDNPEGVPQEVFDGIEAAAKGDRFAWFTDFYKNFYNLDENLGSRISEQ
       .:... :.. :. : :.::: .::  ...    ::  ...::   ::..  : :. .:.
3hea:B LLGAVTPIFGQKPDYPQGVPLDVFARFKTELLKDRAQFISDFNAPFYGI--NKGQVVSQG
        120       130       140       150       160         170

              190       200        210       220       230
1hkh:B AVTGSWNVAIGSAPVAAYAVVPAWIE-DFRSDVEAVRAAGKPTLILHGTKDNILPIDATA
       . : . ..:. ..  :.   : :. : ::: :.  . .   :::..::  :.:.:...:.
3hea:B VQTQTLQIALLASLKATVDCVTAFAETDFRPDMAKIDV---PTLVIHGDGDQIVPFETTG
          180       190       200       210          220       230

     240       250       260       270
1hkh:B RRFHQAVPEADYVEVEGAPHGLLWTHADEVNAALKTFLAK
       .   . .  :.    . ::::.  :::...:  : .:: .
3hea:B KVAAELIKGAELKVYKDAPHGFAVTHAQQLNEDLLAFLKR
             240       250       260       270

This the alignment when the query and subject are switched.

>>1hkh:B                                            (279 aa)
n-w opt: 497  Z-score: 381.8  bits: 78.4 E(1): 1.1e-241
global/global (N-W) score: 497; 38.9% identity (69.3% similar) in 280 aa overlap 

                   10        20        30        40        50
3hea:B STFVAKDGT----QIYFKDWGSGKPVLFSHGWPLDADMWEYQMEYLSSRGYRTIAFDRRG
       . ... . .    ..:..: :::.::.. ::.:::.  :: : . : ..:::.:..::::
1hkh:B GYITVGNENSTPIELYYEDQGSGQPVVLIHGYPLDGHSWERQTRELLAQGYRVITYDRRG
               10        20        30        40        50        60

         60        70        80        90       100       110
3hea:B FGRSDQPWTGNDYDTFADDIAQLIEHLDLKEVTLVGFSMGGGDVARYIARHGSARVAGLV
       :: :..  :: :::::: :.  ..: :::..:.::::::: :..:::.::.:  ::: :.
1hkh:B FGGSSKVNTGYDYDTFAADLHTVLETLDLRDVVLVGFSMGTGELARYVARYGHERVAKLA
               70        80        90       100       110       120

        120       130       140       150       160         170
3hea:B LLGAVTPIFGQKPDYPQGVPLDVFARFKTELLKDRAQFISDFNAPFYGI--NKGQVVSQG
       .:... :.. :. : :.::: .::  ...    ::  ...::   ::..  : :. .:.
1hkh:B FLASLEPFLVQRDDNPEGVPQEVFDGIEAAAKGDRFAWFTDFYKNFYNLDENLGSRISEQ
              130       140       150       160       170       180

          180       190       200       210          220       230
3hea:B VQTQTLQIALLASLKATVDCVTAFAETDFRPDMAKIDV---PTLVIHGDGDQIVPFETTG
       . : . ..:. ..  :.   : :. : ::: :.  . .   :::..::  :.:.:...:.
1hkh:B AVTGSWNVAIGSAPVAAYAVVPAWIE-DFRSDVEAVRAAGKPTLILHGTKDNILPIDATA
              190       200        210       220       230

             240       250       260       270
3hea:B KVAAELIKGAELKVYKDAPHGFAVTHAQQLNEDLLAFLKR
       .   . .  :.    . ::::.  :::...:  : .:: .
1hkh:B RRFHQAVPEADYVEVEGAPHGLLWTHADEVNAALKTFLAK
     240       250       260       270

When I use the Needleman-Wunsch alignment tool on the PDB website I get symmetric behavior. Using 1hkh:B as the query, these are the results from the PDB website (the page may take a second to load).

http://www.rcsb.org/pdb/workbench/workbench.do?action=pw_needlemanwunsch&mol=1hkh.B&mol=3hea.B

Using 3hea:B as the query, these are the results

http://www.rcsb.org/pdb/workbench/workbench.do?action=pw_needlemanwunsch&mol=3hea.A&mol=1hkh.B

Maybe the non-symmetric behavior is due to some feature of the ggsearch implementation? Or perhaps there's an option, to change this behavior? Here's the command I used to run ggsearch.

ggsearch36 path/to/query.fa path/to/subject.fa -s BP62 -E 10 -F 0

From my understanding -s BP62 sets the scoring matrix to Blosum62 and the gap penalties to -11/-1, the same as in BLASTP.

-E sets the maximum evalue and -F sets the minimum evalue, which doesn't matter too much here since I am only aligning two sequences that I know are fairly close.

Needleman-Wunsch FASTA ggsearch alignment • 2.4k views
ADD COMMENT
1
Entering edit mode

Barring a bug, the alignment is not symmetric if the substitution matrix is not symmetric. Are you sure that BP62 refers to the BLOSUM62 matrix ? On the FASTA web site page for ggsearch, there's no BLOSUM62 but a BlastP62 matrix.

ADD REPLY
0
Entering edit mode

In the FASTA package manual page 16 it says

-s BP62 sets the scoring matrix to BLOSUM62 and the gap penalties to -11/-1, identical to BLASTP

ADD REPLY
1
Entering edit mode

Then Christian spotted the most likely explanation. It's possible that PDB picked the same solution twice from the possible ones just by chance. It could also be that PDB's implementation finds all optimal alignments then sorts them by % identity and/or other criterion (on PDB's site, the alignment returned looks like the one with the highest % identity from the two you got with ggsearch).

ADD REPLY
5
Entering edit mode
8.6 years ago
Christian ★ 3.0k
The two alignments have the same score, so there are multiple optimal solutions.
ADD COMMENT

Login before adding your answer.

Traffic: 1249 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6