Parallelization of Biopython proccess

0

Entering edit mode

4.5 years ago

dllopezr ▴ 120

Hi!

I am working parsing blast xml outputs with Biopython and for performance purposes, I want to parallelize the process. The code takes a large blast xml output, perform a calculation over the alignments and save the result in a SQL database

The code is as follows:

from Bio.Blast import NCBIXML
import pymysql
result_handle = open(*file*)
blast_records = NCBIXML.parse(result_handle)

*start sql connection

def calculation(blast_record):
      # do calculation
      # upload to sql

Normally in this way the code is executed:

for blast_record in blast_records:
     calculation(blast_record)

But when I try to use tools as multiprocessing or joblib with a list comprehension like this:

processes = [mp.Process(target=calculation, args=(blast_record)) for blast_record in blast_records]

But I got either an error or the code runs indefinitely without results

Any help in how to structure the code to parallelize it or other advice?

biophython sql parallelization • 818 views

ADD COMMENT • link 4.5 years ago by dllopezr ▴ 120

0

Entering edit mode

I got either an error

You'll have to be more specific here.

ADD REPLY • link 4.5 years ago by WouterDeCoster 47k

Login before adding your answer.