Running BLAST tool using python's multiprocessing package
1
0
Entering edit mode
3.0 years ago
m_asi • 0

I am trying to run online NCBI BLAST in parallel using python multiprocessing package. While running the code. the following error has occurred:

Process Process-4:
Traceback (most recent call last):
  File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\muh_asif\PycharmProjects\parellel\index.py", line 16, in f
    result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
  File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\NCBIWWW.py", line 141, in qblast
    rid, rtoe = _parse_qblast_ref_page(handle)
  File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\NCBIWWW.py", line 253, in _parse_qblast_ref_page
    raise ValueError("Error message from NCBI: %s" % msg)
ValueError: Error message from NCBI: Cannot accept request, error code: -1

This error occurred for many processes, for example for process # 5 and 6 as well.

Apparently NCBI did not accept values for some processes. Is there a way to fix this error? Am I allowed to submit 3 or 4 queries to NCBI at the same time?

Secondly, the processes are only created for the first element of taxa_id_list list not for the second element. Is there a better way to run BLAST in parallel using multiprocessing package? I am new to multiprocessing and I am trying to make BLAST run faster. Here is the link for input file (input_file) and the code is:

from  multiprocessing import current_process
from Bio.Blast import NCBIXML
from Bio.Blast import NCBIWWW
from Bio import SeqIO

def f(record, j, id):
    record = str(record)
    print(record)
    j = str(j)
    print(j)
    proc_name = current_process().name
    print(f"Process name: {proc_name}")

    result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
    blast_records = NCBIXML.parse(result_handle)

    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            print(f"accession num: {alignment.accession} for ID: {id}")



if __name__ == '__main__':

    from  multiprocessing import Process

    fasta_file_name = 'dummy_fasta.fasta'  
    my_fasta = SeqIO.parse(fasta_file_name, "fasta")
    #to restrict blast to a specific  specie.
    taxa_id_list = ["txid9606 [ORGN]", "txid39442 [ORGN]"]

    processes = []
    for j in taxa_id_list:
        for k in my_fasta: # read all sequences from fasta file
            seq = k.seq
            id = k.id
            process = Process(target=f, args=(seq, j, id))
            processes.append(process)
            process.start()
    for l in processes:
        l.join()

thank you.

NCBI BLAST python multiprocessing • 1.8k views
ADD COMMENT
0
Entering edit mode

NCBI is very likely to rate limit you, try just sending 2 or 3 requests at most at once.

ADD REPLY
0
Entering edit mode

Blast itself if multi-threaded so each job you start can use more than one thread. I am not sure why you want to use multi-processing to submit remote blast jobs. Please be considerate of this public resource.

The NCBI WWW BLAST server is a shared resource, and it would be unfair for a few users to monopolize it. To prevent this, the server gives priority to interactive users who run a moderate number of searches. The server also keeps track of how many queries are in the queue for each user as well as how many searches a user has performed recently and prioritizes searches accordingly.

ADD REPLY
0
Entering edit mode

Devon Ryan and genomax thank you for your replies. I was not aware about the NCBI restrictions. now, I will submit max one or two requests to BLAST at once.

ADD REPLY
0
Entering edit mode
15 months ago

This might be a bit late for you, but for anyone else I think I got similar error messages beginning with Process Process... when I didn't enclose my multiprocessing code in a if __name__ == '__main__' guard (this is standard practice for multiprocessing). Could be dreaming though, it's been a while since I debugged it!

ADD COMMENT

Login before adding your answer.

Traffic: 1435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6