Question on Smith-Waterman
4
0
Entering edit mode
8.3 years ago
freezedry ▴ 30

Hello all. I have a question with regards to smith waterman. Is it possible to modify the standard algo to make it more difficult to get indels in the alignment that are less than 5 residues apart?

I was thinking along the lines of changing the penalty cost for getting indels but i have no idea on how to go about it.

alignment sequence • 3.1k views
2
Entering edit mode
8.3 years ago
matted 7.8k

You could do it by working off the same ideas as affine gap alignment (see these random lecture notes or these ones).

The affine gap algorithm uses three dynamic programming matrices to optimize the alignment: (1) the best score of two prefixes that ends in an aligned character, (2) the best score of two prefixes where the first sequence is in a gap, and (3) the best score of two prefixes where the second sequence is in a gap. The transitions and scores are set up so that you pay an extra penalty when you open a gap (move from matrix 1 to 2 or 3), but not when you lengthen an existing gap (stay in matrix 2 or matrix 3).

So based on those ideas, you would need a matrix for each possible distance since the last gap in each sequence (0, 1, 2, 3, 4, or 5+), for each of the two sequences. You could set up the transitions so that each additional aligned character goes "up" a matrix (for example matrix 3 to matrix 4, then matrix 4 to matrix 5+, and finally staying in matrix 5+). You would pay an extra penalty for adding a gap near a previous one (going from matrix 1-4 to 0), but not after you're sufficiently far away (matrix 5+ to 0).

There are a lot of details, and in general it might be annoying to implement. And, as some of the other answers suggest, it's not clear if it's a valid assumption or what it might correspond to biologically. But it can be done, and in the same time complexity as standard Smith-Waterman alignment as long as the maximum tracked inter-gap distance is regarded as a constant.

0
Entering edit mode

Hello,

I just found "affine gap alignment" and I am trying to understand this approach because it very interesting for my research.

I know that "affine gap alignment" favors insertions and deletions and another advantage is to have a better group in alignment without having several separate gap.

I have to align this way to have a good group and do not have separate gap.

But I want to make an alignment of two sequences and there is the types of errors: mismatch(substitutions), insertions and deletions.

In this case, can I use the "affine gap alignment" for this type of alignment?

thanks

0
Entering edit mode

Yes, if I understand your question correctly, that's exactly what it's for.

0
Entering edit mode

I am trying to understand the approach and I want to program but I still want to understand.

Thank you

0
Entering edit mode

Those lecture notes are a good place to start, and I am sure there are many more available if you search further. The textbook "Biological sequence analysis" by Durbin et al. is a good reference. You can read their treatment of affine gap alignment on page 29 here (a sketchy scanned PDF of the textbook).

0
Entering edit mode

Can I use "Affine gap alignment" with BLAST?

or I have to program the idea of "Affine gap alignment"?

thanks

1
Entering edit mode
8.3 years ago
ajmackey ▴ 20

The answer to the OP is to increase the gap-open penalty, relative to the gap-extend penalty, to penalize multiple gaps over a region that seems to "need" gaps; this tunable ratio is what guides Smith-Waterman (and related algorithms) to consider more frequent, short indels vs. less frequent, longer indels.

1
Entering edit mode
8.3 years ago
Michael 54k

Sounds harmless as a modification at first glance, however it is a very big modification (in addition to that it is not feasible with standard SW). In fact, it is a statement about a different model of sequence evolution. What you are saying is that neighboring indels occur less frequently or are more deleterious. You'd better have a very good reason to deviate from the standard (null) model for any serious analysis, otherwise you will just confirm your prejudice. Do you have such evidence, that this is the case wherever you look in the genome? Note that a modified aligner would not necessarily replace an indel with a longer gap opening, but might find very different alignments. Also, you could not use it to reject the null hypothesis, because it is already biased.

0
Entering edit mode
8.3 years ago

The standard algorithm doesn't track the exact length of an indel, but if this is vital to whatever you're doing you could certainly modify it to do so. There's no amount of penalty score tweaking that will do this without changing the algorithm itself.

1
Entering edit mode

This isn't quite correct; Smith-Waterman with affine gap penalties certainly does keep track of the length of an indel, but I think you meant it doesn't keep track of how many residues ago the previous indel occurred (which is the OP's question).

0
Entering edit mode

Indeed, I blame a lack of coffee at the time :)

0
Entering edit mode

Yep, that's impossible as a normal affine-weighted implementation does not track the length of match stretches or what they abut. You'd need an additional state (ins or del -> match/sub) which requires an additional matrix.