Computing Hamming similarity (opposite of distance) between string P and each position in string T
1
0
Entering edit mode
7 months ago
lehuyduc3 • 0

First of all, this is not a question about alignment, but more about string matching/searching, but it might be related.

My problem is in the following image, but I don't know what's its name. The problem is defined as follows:

Input: string P length m, string T length n (P is a short sequence, N is a long genome)

Algorithm: for each position i in [0..n-m], compute res[i] = sum(P==T[i:i+m])

                   In other word, res[i] = the number of equal characters between P and T[i:i+m]


Output: array res[i]

Does anyone know the name of this problem? Or is it never used anywhere?

If it is used, then what's the current method to calculate res[] ?

Thank you.

alignment scoring programming • 611 views
0
Entering edit mode
7 months ago

What you describe appears to be the Hamming distance subtracted from the length of the string.

In a more pedestrian naming, what you have there is called the "number of matches" in an alignment.

A semi-global aligner with zero penalties for end gaps and not allowing internal gaps will find the maximal score in that res[i] vector (and the position of i)

(since the alignments are gapless other fast options may exists, but fast methods do not need to compute the full res[i]

0
Entering edit mode

Oh, so it doesn't have an unique name :\

So in case I want to compute the min(res[]) (instead of the full res[i]), which method can I use?

0
Entering edit mode

The minimum of that score would be the maximal Hamming distance.

There is a concept called maximal Hamming distance under rotation, that sounds more like what you have with some caveats:

0
Entering edit mode

Thanks! I will look into that