Question: Bio.Align using smith-waterman local alignment causes memory leak
0
gravatar for carlos_marchi
18 days ago by
Brazil, São Paulo
carlos_marchi60 wrote:

Hi!

I have a list of permutations of the DNA sequences where the alignment score of the sequence pairs is obtained. I don't know why this process is causing memory leak when the permutation list is big. Here example of score calculation:

for sequence1, sequence2 in sequence_permutation:
    score = self.__calculate_sequence_similarity(sequence1, sequence2)
    alignments[sequence1].append(sequence2)

save_aligments(alignments)

def __calculate_score_alignment(self, sequence1, sequence2): 
    from Bio.Align import substitution_matrices
    from Bio import Align
    from Bio.SubsMat import MatrixInfo

    aligner = Align.PairwiseAligner()
    aligner.mode = 'local'
    aligner.substitution_matrix = substitution_matrices.load('BLOSUM62')
    return aligner.score(sequence1, sequence2)


def __calculate_sequence_similarity(self, sequence1: str, sequence2: str) -> float:         
    if not sequence1 and not sequence2:
        return -1

    score = self.__calculate_score_alignment(sequence1, sequence2)
    score1 = self.__calculate_score_alignment(sequence1, sequence1)
    score2 = self.__calculate_score_alignment(sequence2, sequence2)

    return score / (math.sqrt(score1) * math.sqrt(score2))
ADD COMMENTlink written 18 days ago by carlos_marchi60

A memory leak is a software bug. If it doesn't originate in your code but in a library you're using you should report it to the library's authors. Make sure though that it is really a memory leak and not simply large memory usage caused by having a large data set. Also note that many scripting languages like python may not return all used memory to the system until after the script has exited so if your script creates a data structure using half the available RAM then most of this will stay associated with the script process even if the corresponding data structure has been destroyed.

ADD REPLYlink written 18 days ago by Jean-Karim Heriche24k

The program memory increases in each interaction. So, It is not due to the dataset size, It may be some object that has destroyed as you wrote before. The object Aligner has created In each interaction, so I can't see an error with that code.

ADD REPLYlink written 18 days ago by carlos_marchi60

This is something to report as an issue on the biopython github repository if you are confident its a real problem with the library.

ADD REPLYlink written 18 days ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1806 users visited in the last hour
_