MemoryError w/ matrix | Smith-Waterman Algorithm Python
1
0
Entering edit mode
4 weeks ago
Daniel Bell ▴ 10

I am implementing my own version of the Smith-Waterman algorithm.

This works perfectly fine with 2 genomes of 100 chars each. However, I want to run this pairwise sequence alignment algorithm on the full-length genomes (10,000 * 10,000). I get a MemoryError.

Code:

m, n = len(seqA), len(seqB)  # let m, n denote the lengths of two sequences

pointer_matrix = tasks.empty_matrix(m + 1, n + 1)  # stores traceback path (same dimensions)

score_matrix = tasks.empty_matrix(m+1, n+1)  # '+1' - far-left col & top row represent indexes of sequences' chars

max_score = 0  # Initial maximum score obtainable
# Contents & Declare Pointers
for i in range(1, m+1):
for j in range(1, n+1):
above_score = score_matrix[i][j-1] + gap_points # ERROR
left_score = score_matrix[i-1][j] + gap_points
above_left_score = score_matrix[i-1][j-1] + tasks.match(seqA[i-1], seqB[j-1], points_scheme)
score_matrix[i][j] = max(above_score, left_score, above_left_score, 0)


MemoryError:

    Traceback (most recent call last):
File "...\main.py", line 72, in <module>
SmithWaterman.SmithWaterman("DNA", ["genome1", "genome2"], **points)
File "...\SmithWaterman.py", line 26, in SmithWaterman
above_score = score_matrix[i][j-1] + gap_points
MemoryError

Matrix Smith-Waterman Python Memory Memory-Error • 175 views
0
Entering edit mode

Please stop deleting posts after they have gotten responses. It is poor etiquette.

1
Entering edit mode
4 weeks ago

Use a python array or a numpy array to create large, matrix-like objects.

https://docs.python.org/3/library/array.html

https://numpy.org/doc/stable/reference/generated/numpy.array.html

For those, you can accurately compute the memory needs.

Python lists use massively more memory than strictly needed as they need to be able to represent various types.