Question: Gap penalty in smith waterman
gravatar for
4.9 years ago by
United Kingdom wrote:

Hi all,


I am looking for advice on how to calculate the gap and affine gap extension penalties that are used in the dynamic programming approaches to sequence alignment. I understand that the substitution matrices are simple lod scores, but always see somewhat hand wavy justifications of gap penalties.

As an aside, is there a reason why there doesn't seem to be nearly as much literature for substitution matrices in DNA as opposed to proteins - presumably there is a reason for this?



sequence alignment • 2.2k views
ADD COMMENTlink modified 4.9 years ago by Istvan Albert ♦♦ 81k • written 4.9 years ago by
gravatar for Istvan Albert
4.9 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Interesting question on a subject I never actually thought about. 

I would say the cause for lack of results on substitution matrices for DNA is that there are so few options: there are just three alternatives for which the score will depend on the context that it is being used in. In addition the nocoding DNA has a lot less conservation and a lot less defined functionality than the protein coding region - so it is hard to come up with a general rule. 

As for gaps: the information in a mismatch is easy to capture and formalize, a gap's role will depend on what is being replaced, how long the gaps are etc. 

ADD COMMENTlink written 4.9 years ago by Istvan Albert ♦♦ 81k

So what you are saying is that it is necessary to calculate gap penalty for a given base matching, and use replacement base context? Do you have a sense of how people are generally coming up with the substitution matrices for smith waterman dna local alignment experiments - it seems to be mostly just qualitative choice. Is his a fair characterisation?

ADD REPLYlink written 4.9 years ago by

scoring is a measure of similarity - it is used to compare sequences and serves as a metric. For that to work properly it has to actually be able to quantify the differences. And when it comes to just DNA there is just not enough information - it is a bit like trying to infer someone's height from their shoe size. It works for the extreme cases - a baby vs Shaq - but it just does not contain information to properly characterize an average height person.

ADD REPLYlink written 4.9 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1096 users visited in the last hour