How To Interpret Gap Penalty For Sequence Alignment?
3
2
Entering edit mode
12.1 years ago
Shan ▴ 50

Hi, As per literature for sequence alignment we can have gap penalties favoring larger(Affine gap penalty) or fewer gaps(Linear gap penalty). I am wondering what could be done if we don't want to favor any gaps. Would it be setting the gap penalty to zero or is it domain specific value?

Thanks a lot

sequence alignment • 36k views
ADD COMMENT
0
Entering edit mode

What do you mean with not favoring gaps? Making no distinction between small and large gaps, or not caring about gaps at all (in which case you can do what DK suggested)?

ADD REPLY
0
Entering edit mode

Is this in the context of a particular alignment program or are thinking about sequence alignment in the abstract.

ADD REPLY
4
Entering edit mode
12.1 years ago

If you don't want to favor gaps, you should make your gap penalty a large value.

The gap penalty is exactly what it sounds like, a penalty. So if you want to make it so gaps are not favored, you need to place a heavier penalty on any sequences with gaps. Setting the gap penalty to zero would mean any gaps in a sequence would not be penalized.

ADD COMMENT
0
Entering edit mode

can you please give an interpretation to that? how would it happen?

ADD REPLY
3
Entering edit mode
12.1 years ago
Ying W ★ 4.2k

I think you are missing a key concept of gap penalties, there are two values:

  • Gap opening - cost to create a gap
  • Gap extension - cost to make a gap bigger (must already have created a gap)

Here are some biological situations keep in mind when changing these two penalties:

  • high gap opening penalty: point insertions/deletions would ruin your alignment
  • high gap extension penalty: this will affect the size of indels in your alignment
ADD COMMENT
2
Entering edit mode
12.1 years ago
Ari ▴ 120

The gap penalties affect the alignment result, high penalties making compact alignments and low ones the opposite. The aim of analysing data without a prior assumption of the "true" penalty (and thus favoring gaps of certain length) is great; unfortunately there are no perfect solutions for doing this.

I have a couple of comments:

  • one cannot separate gap parameters from the substitution scoring parameters. If one sets the gap penalties to zero, the alignment depends on the average scores given for matches and mismatches. Typically these substitution scores are optimised for a certain evolutionary distances and do not behave correctly for sequence pairs that are either more similar or less similar than expected. (Even worse is the use of completely artificial scoring matrices by some popular methods: using a non-negative matrix, two random sequences will match perfectly fine and get aligned across their full length.)

  • segment-based aligners (e.g. Dialign), that find significant matches and leave the rest unaligned, do not explicitly model gaps and have no gap penalties. The downside is that the resulting alignments may not be fully aligned. Also, the methods may be statistically sound but they do not necessarily be biologically realistic nor do they use the evolutionary information available.

  • methods based on insertion-deletion-models (e.g. BAli-Phy, StatAlign) infer the gap parameters from the data and the results are not affected by prior choices for the gap penalties. The downside is that the gap models are rather simplistic: in real life we observe gaps of hugely different lengths and it is very difficult to model this variation.

ADD COMMENT

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6