Question: Running BLAST tool using python's multiprocessing package
0
gravatar for m_asi
21 months ago by
m_asi0
Portugal
m_asi0 wrote:

I am trying to run online NCBI BLAST in parallel using python multiprocessing package. While running the code. the following error has occurred:

Process Process-4:
Traceback (most recent call last):
  File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\muh_asif\PycharmProjects\parellel\index.py", line 16, in f
    result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
  File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\NCBIWWW.py", line 141, in qblast
    rid, rtoe = _parse_qblast_ref_page(handle)
  File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\NCBIWWW.py", line 253, in _parse_qblast_ref_page
    raise ValueError("Error message from NCBI: %s" % msg)
ValueError: Error message from NCBI: Cannot accept request, error code: -1

This error occurred for many processes, for example for process # 5 and 6 as well.

Apparently NCBI did not accept values for some processes. Is there a way to fix this error? Am I allowed to submit 3 or 4 queries to NCBI at the same time?

Secondly, the processes are only created for the first element of taxa_id_list list not for the second element. Is there a better way to run BLAST in parallel using multiprocessing package? I am new to multiprocessing and I am trying to make BLAST run faster. Here is the link for input file (input_file) and the code is:

from  multiprocessing import current_process
from Bio.Blast import NCBIXML
from Bio.Blast import NCBIWWW
from Bio import SeqIO

def f(record, j, id):
    record = str(record)
    print(record)
    j = str(j)
    print(j)
    proc_name = current_process().name
    print(f"Process name: {proc_name}")

    result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
    blast_records = NCBIXML.parse(result_handle)

    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            print(f"accession num: {alignment.accession} for ID: {id}")



if __name__ == '__main__':

    from  multiprocessing import Process

    fasta_file_name = 'dummy_fasta.fasta'  
    my_fasta = SeqIO.parse(fasta_file_name, "fasta")
    #to restrict blast to a specific  specie.
    taxa_id_list = ["txid9606 [ORGN]", "txid39442 [ORGN]"]

    processes = []
    for j in taxa_id_list:
        for k in my_fasta: # read all sequences from fasta file
            seq = k.seq
            id = k.id
            process = Process(target=f, args=(seq, j, id))
            processes.append(process)
            process.start()
    for l in processes:
        l.join()

thank you.

ADD COMMENTlink modified 5 weeks ago by timothy.kirkwood20 • written 21 months ago by m_asi0

NCBI is very likely to rate limit you, try just sending 2 or 3 requests at most at once.

ADD REPLYlink written 21 months ago by Devon Ryan98k

Blast itself if multi-threaded so each job you start can use more than one thread. I am not sure why you want to use multi-processing to submit remote blast jobs. Please be considerate of this public resource.

The NCBI WWW BLAST server is a shared resource, and it would be unfair for a few users to monopolize it. To prevent this, the server gives priority to interactive users who run a moderate number of searches. The server also keeps track of how many queries are in the queue for each user as well as how many searches a user has performed recently and prioritizes searches accordingly.

ADD REPLYlink written 21 months ago by GenoMax96k

Devon Ryan and genomax thank you for your replies. I was not aware about the NCBI restrictions. now, I will submit max one or two requests to BLAST at once.

ADD REPLYlink written 21 months ago by m_asi0
0
gravatar for timothy.kirkwood
5 weeks ago by
timothy.kirkwood20 wrote:

This might be a bit late for you, but for anyone else I think I got similar error messages beginning with Process Process... when I didn't enclose my multiprocessing code in a if __name__ == '__main__' guard (this is standard practice for multiprocessing). Could be dreaming though, it's been a while since I debugged it!

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by timothy.kirkwood20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2354 users visited in the last hour
_