Question: Gap Continuation Penalty With Dynamic Programming ?
2
8.6 years ago by
User 5037290
User 5037290 wrote:

Hi. When there is a match score, mismatch score and gap penalty the problem of aligning sequences can be done using dynamic programming. Is it possible to use gap continuation penalty in aligning two sequences under the dynamic programming method?

• 5.5k views
modified 6.4 years ago by Biostar ♦♦ 20 • written 8.6 years ago by User 5037290
3
8.6 years ago by
Michael Kuhn5.0k
EMBL Heidelberg
Michael Kuhn5.0k wrote:

Yes, this is possible. For example, the gap extension penalty has been implemented in JAligner and you can check the source code to see how it's done (in the `construct` function).

Iam interested in manualy performing it. On paper. Could you please explain it ?

please look at the linked source code to see how it is done

Introducing gap has to be more penalized than just extending already existing gap. For example, choosing gap instead of penalty for mismatch is much more important than extending 12 gaps into 13 gaps. Thats why the differentiation is made.

3
8.6 years ago by
Haibao Tang3.0k
Mountain View, CA
Haibao Tang3.0k wrote:

See the introductory slides here. I think you understand how to fill the DP matrix. For each cell in the DP matrix, we pick the max of three directions from three adjacent cells: UP, LEFT, DIAGONAL. UP and LEFT give you one gap, DIAGONAL give you match/mismatch.

Now the affine gap penalty makes the calculation more difficult. For each cell, we still pick the max of the three directions. But now since the gap score is not linear anymore (i.e. `Two gaps != 2 x one gap`), you'll have to consider all the cells on LEFT, all the cells on UP, rather than just the immediate neighbor.

This also increase the computational complexity from squared to cubic.

0
8.4 years ago by
Peter Kovac70
Bratislava, Slovakia
Peter Kovac70 wrote:

If you can read Java code, here is a clear implementation of affine gap scores for the Needleman-Wunsch algorithm (and some other versions too). You can find more material on the author's site. That implementation comes straightforward from explanations in the Biological sequence analysis textbook.

BTW, the computational complexity is still squared (although you have to keep three DP matrices in the memory).