Question: What Value For The Gap Penalty Should Be Used In A Pam250 Substitution Matrix?
0
gravatar for coderodde
6.5 years ago by
coderodde0
coderodde0 wrote:

Hello.

I have read the article "A* with Partial Expansion for large branching factor problems" [1] by T. Yoshizumi, T. Miura and T Ishida. Also, I came up with an implementation of their algorithm, and applied it to solving the multiple sequence alignment problem. The paper presents the PAM250 substitution matrix used for scoring the alignment, but they never stated what gap penalty value they used, but rather stated that the prior relevant research paper proposes the value of 8, which, in my program, results in alignment not matching the result of ClustalX on the same input.

What gap penalty value should I use in the context of [1]?

msa • 2.8k views
ADD COMMENTlink written 6.5 years ago by coderodde0
0
gravatar for Niek De Klein
6.5 years ago by
Niek De Klein2.4k
Netherlands
Niek De Klein2.4k wrote:

I can't find it on their website right now, but are you sure that ClustalX uses PAM250? It's an outdated matrix and I would imagine ClustalX to use BLOSUM.

ADD COMMENTlink written 6.5 years ago by Niek De Klein2.4k
0
gravatar for coderodde
6.5 years ago by
coderodde0
coderodde0 wrote:

[Clarification]

The article presents a demo alignment of 3 sequences with length 4, 3 and 3 acids as follows:

ACGH
CFG
EAC,

which aligns both in the article and Clustal as:

-AC-GH
--CFG-
EAC---

But if I use the implied gap penalty value of 8, I get:

ACGH
-CFG
-EAC

It is possible in my implementation to get to the "article demo"/Clustal alignment, but for that to happen I have to change the gap penalty from 8 to 5. So my question refines to "No matter how outdated the PAM250 is, what is the bioinformaticians' consensus on the gap penalty value when dealing with the aforementioned substitution matrix?"

ADD COMMENTlink written 6.5 years ago by coderodde0
0
gravatar for Niek De Klein
6.5 years ago by
Niek De Klein2.4k
Netherlands
Niek De Klein2.4k wrote:

There is no real consensus on gap penalty, because the 'best' gap penalty is dependent on which sequences you want to align. If you want to align proteins of two very distant organisms you want to set the gap penalties lower than if you want to align two proteins of closely related organisms.

As for the default value of ClustalX, according to this article on Effects of Gap Open and Gap Extension Penalties ClustalW uses GOP 15.0, GEP 6.66 by default. GOP is the gap opening penalty and GEP is the gep extension penalty. I think ClustalX uses the same default as ClustalW.

ADD COMMENTlink modified 5.5 years ago • written 6.5 years ago by Niek De Klein2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1229 users visited in the last hour