Question: How To Evaluate The Scoring Scheme Used In Pairwise Alignment
0
5.4 years ago by
Maria150
Maria150 wrote:

Hello, I want to align two nucleotide sequences using a semi global alignment method where gaps are for free at the end of sequences. I'm using simple scoring scheme i.e constants for match , mismatch and gap. The problem is that I don't know what score shall I choose knowing that by changing the length of aligned sequences is changing also . Below is an example of sequences : 1- longer sequence: cggacgtgccattgcatgccccgggacgc 2- shorter sequence: acgtggattacgagagaga The alignment should look like this `

``````cgggacgtgccattgcatgccccgggacgc
--- acgtggatt-----------------
``````

*question : how to evaluate the scoring scheme i'm using in order to chose the best ? *

Thanks in advance for any suggestion

modified 5.2 years ago by Biostar ♦♦ 20 • written 5.4 years ago by Maria150

What do you mean by what score should you choose? The score for matching, mismatching and gaps?

yes for example match , mismatch gap : +1,-1,-2 is one option but how to evaluate this choice ? and why not + 1,-1,-3 for example etc ..

2
5.4 years ago by
Niek De Klein2.4k
Netherlands
Niek De Klein2.4k wrote:

It depends on your dataset. For simple alignments (like the one you showed) having +1/-1 for match/mismatch is usually good enough. However, this does not take into consideration the difference between mutations, like transversion and transition. If you want to include this you want to probably use the Kimura model, which gives a scoring model like this:

``````  A C G T
A 6 1 2 1
C 1 6 1 2
G 2 1 6 1
T 1 2 1 6
``````

For gap penalty it depends on how divergent the sequences are that you are aligning. For closely related organisms, take a low gap penalty. The more divergent organisms, take a higher gap penalty.