Hi all,
I am looking for advice on how to calculate the gap and affine gap extension penalties that are used in the dynamic programming approaches to sequence alignment. I understand that the substitution matrices are simple lod scores, but always see somewhat hand wavy justifications of gap penalties.
As an aside, is there a reason why there doesn't seem to be nearly as much literature for substitution matrices in DNA as opposed to proteins - presumably there is a reason for this?
Cheers
So what you are saying is that it is necessary to calculate gap penalty for a given base matching, and use replacement base context? Do you have a sense of how people are generally coming up with the substitution matrices for smith waterman dna local alignment experiments - it seems to be mostly just qualitative choice. Is his a fair characterisation?
scoring is a measure of similarity - it is used to compare sequences and serves as a metric. For that to work properly it has to actually be able to quantify the differences. And when it comes to just DNA there is just not enough information - it is a bit like trying to infer someone's height from their shoe size. It works for the extreme cases - a baby vs Shaq - but it just does not contain information to properly characterize an average height person.