How to get best two alignments?
1
1
Entering edit mode
8.1 years ago
hamza ▴ 10

I am asked to write a python code to get the best two alignments. respectively. How can I do that?

sequence alignment • 1.5k views
ADD COMMENT
0
Entering edit mode

I think this is not using edit distance scoring function, right?

ADD REPLY
0
Entering edit mode

The closed-form solution counts all gapped alignments, and it is derived from recurrence relations for alignments of sequences of lengths m and n. Gaps in an alignment would represent insertions or deletions relative to one or the other sequence. An edit distance function usually scores gaps and substitutions. The score shouldn’t necessarily matter in terms of number of total alignments, unless you need to count only some subset of all alignments under some cost or score threshold.

ADD REPLY
2
Entering edit mode
8.1 years ago

A closed-form solution is offered in An exact formula for the number of alignments between two DNA sequences by Torres, Cabada, and Nieto. If the recurrence formula described in this paper seems reasonable, you could calculate this directly:

#!/usr/bin/env python

import sys
import scipy.special

def alignments(m, n):
    if m < 0 or n < 0:
        raise ValueError('m and n should be non-negative')
    s = 0
    b = min(m, n) + 1
    for k in xrange(0, b):
        s += (2 ** k) * scipy.special.binom(m, k) * scipy.special.binom(n, k)
    return s

def main():
    m = 4
    n = 2
    a = alignments(m, n)
    assert (m == 4 and n == 2 and a == 41), "something went wrong!"
    sys.stdout.write("alignments(%d, %d) = %d\n" % (m, n, a))

if __name__ == "__main__":
    main()

The scipy.special.binom function approximates its calculation of the binomial coefficient, which is probably going to be an issue for large m and n. If you're working with large values, you may want to look into using the Python long type and consider the accuracy of the answer. The Stack Overflow link offers alternatives to the scipy.special library that generate an exact calculation of the binomial coefficient, at the cost of a longer runtime.

ADD COMMENT

Login before adding your answer.

Traffic: 3206 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6