Question: Parallelization of Biopython proccess
gravatar for dllopezr
10 months ago by
dllopezr60 wrote:


I am working parsing blast xml outputs with Biopython and for performance purposes, I want to parallelize the process. The code takes a large blast xml output, perform a calculation over the alignments and save the result in a SQL database

The code is as follows:

from Bio.Blast import NCBIXML
import pymysql
result_handle = open(*file*)
blast_records = NCBIXML.parse(result_handle)

*start sql connection

def calculation(blast_record):
      # do calculation
      # upload to sql

Normally in this way the code is executed:

for blast_record in blast_records:

But when I try to use tools as multiprocessing or joblib with a list comprehension like this:

processes = [mp.Process(target=calculation, args=(blast_record)) for blast_record in blast_records]

But I got either an error or the code runs indefinitely without results

Any help in how to structure the code to parallelize it or other advice?

ADD COMMENTlink written 10 months ago by dllopezr60

I got either an error

You'll have to be more specific here.

ADD REPLYlink written 10 months ago by WouterDeCoster44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1017 users visited in the last hour