Question: Parallelization of Biopython proccess
0
gravatar for dllopezr
21 days ago by
dllopezr40
dllopezr40 wrote:

Hi!

I am working parsing blast xml outputs with Biopython and for performance purposes, I want to parallelize the process. The code takes a large blast xml output, perform a calculation over the alignments and save the result in a SQL database

The code is as follows:

from Bio.Blast import NCBIXML
import pymysql
result_handle = open(*file*)
blast_records = NCBIXML.parse(result_handle)

*start sql connection

def calculation(blast_record):
      # do calculation
      # upload to sql

Normally in this way the code is executed:

for blast_record in blast_records:
     calculation(blast_record)

But when I try to use tools as multiprocessing or joblib with a list comprehension like this:

processes = [mp.Process(target=calculation, args=(blast_record)) for blast_record in blast_records]

But I got either an error or the code runs indefinitely without results

Any help in how to structure the code to parallelize it or other advice?

ADD COMMENTlink written 21 days ago by dllopezr40

I got either an error

You'll have to be more specific here.

ADD REPLYlink written 21 days ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1845 users visited in the last hour