Question: Running BLAST tool using python's multiprocessing package
21 months ago
wrote:

I am trying to run online NCBI BLAST in parallel using python multiprocessing package. While running the code. the following error has occurred:

Process Process-4:
Traceback (most recent call last):
  File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\", line 297, in _bootstrap
  File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\muh_asif\PycharmProjects\parellel\", line 16, in f
    result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
  File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\", line 141, in qblast
    rid, rtoe = _parse_qblast_ref_page(handle)
  File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\", line 253, in _parse_qblast_ref_page
    raise ValueError("Error message from NCBI: %s" % msg)
ValueError: Error message from NCBI: Cannot accept request, error code: -1

This error occurred for many processes, for example for process # 5 and 6 as well.

Apparently NCBI did not accept values for some processes. Is there a way to fix this error? Am I allowed to submit 3 or 4 queries to NCBI at the same time?

Secondly, the processes are only created for the first element of taxa_id_list list not for the second element. Is there a better way to run BLAST in parallel using multiprocessing package? I am new to multiprocessing and I am trying to make BLAST run faster. Here is the link for input file (input_file) and the code is:

from  multiprocessing import current_process
from Bio.Blast import NCBIXML
from Bio.Blast import NCBIWWW
from Bio import SeqIO

def f(record, j, id):
    record = str(record)
    j = str(j)
    proc_name = current_process().name
    print(f"Process name: {proc_name}")

    result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
    blast_records = NCBIXML.parse(result_handle)

    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            print(f"accession num: {alignment.accession} for ID: {id}")

if __name__ == '__main__':

    from  multiprocessing import Process

    fasta_file_name = 'dummy_fasta.fasta'  
    my_fasta = SeqIO.parse(fasta_file_name, "fasta")
    #to restrict blast to a specific  specie.
    taxa_id_list = ["txid9606 [ORGN]", "txid39442 [ORGN]"]

    processes = []
    for j in taxa_id_list:
        for k in my_fasta: # read all sequences from fasta file
            seq = k.seq
            id =
            process = Process(target=f, args=(seq, j, id))
    for l in processes:

thank you.

NCBI is very likely to rate limit you, try just sending 2 or 3 requests at most at once.

Blast itself if multi-threaded so each job you start can use more than one thread. I am not sure why you want to use multi-processing to submit remote blast jobs. Please be considerate of this public resource.

The NCBI WWW BLAST server is a shared resource, and it would be unfair for a few users to monopolize it. To prevent this, the server gives priority to interactive users who run a moderate number of searches. The server also keeps track of how many queries are in the queue for each user as well as how many searches a user has performed recently and prioritizes searches accordingly.

Devon Ryan and genomax thank you for your replies. I was not aware about the NCBI restrictions. now, I will submit max one or two requests to BLAST at once.

5 weeks ago
wrote:

This might be a bit late for you, but for anyone else I think I got similar error messages beginning with Process Process... when I didn't enclose my multiprocessing code in a if __name__ == '__main__' guard (this is standard practice for multiprocessing). Could be dreaming though, it's been a while since I debugged it!

