Question: Help with dynamic programming algorithm
0
6 months ago by
laura_savana20
laura_savana20 wrote:

Hi everyone, I have many doubts about dynamic programming, in particular, I am interested about global alignment of two sequences considering both gap insertion and gap extension penalties.

I hope strongly that someone of you can help me. I am desperate.

I know that to compute the best global alignment of two sequences with dynamic programming I must follow this three steps:

1. Initialization of the score and the traceback matrices
2. Calculation of scores and filling in the score and traceback matrices
3. Inferring the alignment from the traceback matrix

We suppose that we have these two sequences:

seq 1 = ATTCGGTGA (length 9)

seq 2 = ACCCTGG (length 7)

and gap insertion = -10, gap extension = -2.

I create a matrix 10x8 initialized with 0 (I usually call this matrix "alignment matrix") and another matrix of the same sizes that I call "directions matrix".

In the first matrix I insert in each cell the maximum value of three possibles:

• diag = S(i – 1, j – 1 ) + r(i, j) // r is char replacement score
• up = S(i – 1, j) + gap_function(row_index, col_index, arrow, current_direction, gap_ins, gap_ext)
• left = S(i, j – 1) + gap_function(row_index, col_index, arrow, current_direction, gap_ins, gap_ext)

(I know that to choose the r(i, j) I need to substitution matrix)

Gap_function is a function that decide if add gap insertion or gap extension penalty based on the arrow in the cell where current cell value (in alignment matrix) come from.

This is Python pseudo-code:

``````gap_function (row_index, col_index, arrow, current_direction, gap_ins, gap_ext):
return gap_ins if arrow != current_direction
gap_ext otherwise
``````

At this link is possible see my alignment matrix: https://ibb.co/qs4JNt3

In the second matrix I simultaneously insert the direction from where value derived. For example, if the maximum value inserted in the cell (i,j) of the alignment matrix derived from diagonal, in cell (i,j) of directions matrix I insert "diag". Important! The first row of directions matrix (except at position 0) is initialized with "left" and the first column with "up" (also in this case, except at position 0), like this:

https://ibb.co/0FHF5JN

I know that in global alignment to reconstruct the aligment is used traceback, that consist on start from the last cell (bottom-right) of the directions matrix and return to the first cell (up-left).

Why to reconstruct alignment is used traceback? Is not right to start from the fist cell of directions matrix? Furthermore, is compulsory to start from bottom-right cell? Why?

The last questions.

Before I wrote that alignment matrix is initialized to zeros, but maybe is more correct initialize first row and first column with gap penalities? If yes, in what way? Should I consider both gap insertion and gap extension penalties? I should initialize my alignment matrix in this way (see link please)?

https://ibb.co/ByMMQ0B

Thank you so much.

modified 5 months ago by fishgolden390 • written 6 months ago by laura_savana20

How to add images to a Biostars post

2
5 months ago by
fishgolden390
fishgolden390 wrote:

Why to reconstruct alignment is used traceback? Is not right to start from the fist cell of directions matrix? Furthermore, is compulsory to start from bottom-right cell? Why?

"the fist cell of directions matrix" means top left cell? Because if you start from top left, you cannot know which directions/pathes will be connected to the goal. The wrong pathes end at cells without next pathes(dead end). I mean, you probably reach a dead end and have to go back to the last fork. and retry the process over and over. It's too redundant & tiresome. (sorry if I have misunderstood your question)

Before I wrote that alignment matrix is initialized to zeros, but maybe is more correct initialize first row and first column with gap penalities? If yes, in what way? Should I consider both gap insertion and gap extension penalties? I should initialize my alignment matrix in this way (see link please)?

I think the method uses zero for cells in first row and column is called "semi-global alignment". (With this method, an alignment stops at one of the cells in the last column or the last row.) "global alignment" uses gap penalties for those cells. https://en.wikipedia.org/wiki/Sequence_alignment#Global_and_local_alignments

Comparing semi-global alignment and global alignment, the answer of the following question may be easier to understand.

Furthermore, is compulsory to start from bottom-right cell? Why?

Because global alignment is the method which is designed to find optimal alignment utilizing all regions of both sequences. Semi-global or local are not.

1

"the fist cell of directions matrix" means top left cell?

Exactly! You understood what I was referring to.

Thank you!! So, in the global alignment to reconstruct the alignment I must start from the bottom-right cell in the direction matrix, instead in the semi-global I start from the greater value in the last row or last column, right?

Speaking of initialization of first row and first column of alignment matrix, yesterday I found that they are initialized in this way:

You know the reason why?

1

instead in the semi-global I start from the greater value in the last row or last column, right?

Yes, that's right.

You know the reason why?

I couldn't find the strange part. You mean "why they set the constant value (-2) for penalty"? One of the reasons may be it's very easy to implement (write code) compared with the affine gap penalty strategy.

Yes, that's right.

Ok, thank you.

You mean "why they set the constant value (-2) for penalty"?

No, no, I don't understand why they progressively multiply gap insertion penalty by 1, 2, 3, 4 etc... I don't understand why they initialized first row and first column of alignment matrix in this way (increasing progressively gap insertion penalty). Are you undestand what I saying?

1

I see. Now I understand. That is

Because global alignment is the method which is designed to find optimal alignment utilizing all regions of both sequences.

With that procedure, the gap length in 5' region becomes shorter & the alignment tends to utilize all regions.

``````seqA: domain1-domain2-domain3
seqB: domain1-domain2'-domain3
``````

domain1 and domain3 in seqA and seqB are identical. domain2 and domain2' are a bit different. & you want to align domain2 and domain2'.

The strategy of global alignment is something like that. If there are enough information about start and end of sequences, in this case, the domain2 or 2' sequences start at the ending point of domain1 and end at the starting point of domain3, there should be incremental gap penalty because longer gap insertions (may mean multiple insertion or deletion events of nucleotides) are less frequently occur compared with shorter gap insertions.

1

You really helped me! Thanks a lot! Now everything is clear.