Question

MemoryError w/ matrix | Smith-Waterman Algorithm Python

0

Entering edit mode

4.2 years ago

Anonymous ▴ 10

I am implementing my own version of the Smith-Waterman algorithm.

This works perfectly fine with 2 genomes of 100 chars each. However, I want to run this pairwise sequence alignment algorithm on the full-length genomes (10,000 * 10,000). I get a MemoryError.

Code:

m, n = len(seqA), len(seqB)  # let m, n denote the lengths of two sequences

pointer_matrix = tasks.empty_matrix(m + 1, n + 1)  # stores traceback path (same dimensions)

score_matrix = tasks.empty_matrix(m+1, n+1)  # '+1' - far-left col & top row represent indexes of sequences' chars

max_score = 0  # Initial maximum score obtainable
# Contents & Declare Pointers
for i in range(1, m+1):
    for j in range(1, n+1):
        above_score = score_matrix[i][j-1] + gap_points # ERROR
        left_score = score_matrix[i-1][j] + gap_points
        above_left_score = score_matrix[i-1][j-1] + tasks.match(seqA[i-1], seqB[j-1], points_scheme)
        score_matrix[i][j] = max(above_score, left_score, above_left_score, 0)

Memory Error:

    Traceback (most recent call last):
      File "...\main.py", line 72, in <module>
        SmithWaterman.SmithWaterman("DNA", ["genome1", "genome2"], **points)
      File "...\SmithWaterman.py", line 26, in SmithWaterman
        above_score = score_matrix[i][j-1] + gap_points
    MemoryError

Python Smith-Waterman • 1.6k views

ADD COMMENT • link updated 16 months ago by Ram 45k • written 4.2 years ago by Anonymous ▴ 10

0

Entering edit mode

Please stop deleting posts after they have gotten responses. It is poor etiquette.

ADD REPLY • link 4.2 years ago by jared.andrews07 ★ 19k

score 1 · Answer 1 · 2021-04-12

Use a python array or a numpy array to create large, matrix-like objects.

https://docs.python.org/3/library/array.html

https://numpy.org/doc/stable/reference/generated/numpy.array.html

For those, you can accurately compute the memory needs.

Python lists use massively more memory than strictly needed as they need to be able to represent various types.