Computing Hamming similarity (opposite of distance) between string P and each position in string T
1
0
Entering edit mode
3.1 years ago
lehuyduc3 • 0

First of all, this is not a question about alignment, but more about string matching/searching, but it might be related.

My problem is in the following image, but I don't know what's its name.

Here P = 'agcga'.

The problem is defined as follows:

Input: string P length m, string T length n (P is a short sequence, N is a long genome)

Algorithm: for each position i in [0..n-m], compute res[i] = sum(P==T[i:i+m])

                   In other word, res[i] = the number of equal characters between P and T[i:i+m]

Output: array res[i]

Does anyone know the name of this problem? Or is it never used anywhere?

If it is used, then what's the current method to calculate res[] ?

Thank you.

alignment scoring programming • 1.5k views
ADD COMMENT
0
Entering edit mode
3.1 years ago

What you describe appears to be the Hamming distance subtracted from the length of the string.

In a more pedestrian naming, what you have there is called the "number of matches" in an alignment.

A semi-global aligner with zero penalties for end gaps and not allowing internal gaps will find the maximal score in that res[i] vector (and the position of i)

(since the alignments are gapless other fast options may exists, but fast methods do not need to compute the full res[i]

ADD COMMENT
0
Entering edit mode

Oh, so it doesn't have an unique name :\

So in case I want to compute the min(res[]) (instead of the full res[i]), which method can I use?

ADD REPLY
0
Entering edit mode

The minimum of that score would be the maximal Hamming distance.

There is a concept called maximal Hamming distance under rotation, that sounds more like what you have with some caveats:

ADD REPLY
0
Entering edit mode

Thanks! I will look into that

ADD REPLY

Login before adding your answer.

Traffic: 2514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6