The gap penalties affect the alignment result, high penalties making compact alignments and low ones the opposite. The aim of analysing data without a prior assumption of the "true" penalty (and thus favoring gaps of certain length) is great; unfortunately there are no perfect solutions for doing this.
I have a couple of comments:
one cannot separate gap parameters from the substitution scoring parameters. If one sets the gap penalties to zero, the alignment depends on the average scores given for matches and mismatches. Typically these substitution scores are optimised for a certain evolutionary distances and do not behave correctly for sequence pairs that are either more similar or less similar than expected. (Even worse is the use of completely artificial scoring matrices by some popular methods: using a non-negative matrix, two random sequences will match perfectly fine and get aligned across their full length.)
segment-based aligners (e.g. Dialign), that find significant matches and leave the rest unaligned, do not explicitly model gaps and have no gap penalties. The downside is that the resulting alignments may not be fully aligned. Also, the methods may be statistically sound but they do not necessarily be biologically realistic nor do they use the evolutionary information available.
methods based on insertion-deletion-models (e.g. BAli-Phy, StatAlign) infer the gap parameters from the data and the results are not affected by prior choices for the gap penalties. The downside is that the gap models are rather simplistic: in real life we observe gaps of hugely different lengths and it is very difficult to model this variation.
7.6 years ago by
Ari • 110