I have a multi-FASTA file having ~125 protein sequences. I need to perform a BLASTP seach against remote
nr database. I tried using
NcbiblastpCommandline, but the issue is that it only accepts files as input. Since my file has a huge number of sequences, I get this error
ERROR: An error has occurred on the server, [blastsrv4.REAL]:Error: CPU usage limit was exceeded, resulting in SIGXCPU (24). Storing each sequence from the multi-FASTA file to a separate file at a time works, but then the BLAST search becomes tremendoulsy slow (~10 min/query on an average as opposed to ~1 min/query on the NCBI site).
blastp_results =  from Bio.Blast.Applications import NcbiblastpCommandline from Bio import SeqIO record_iterator = SeqIO.parse("AmpB_DEPs.fasta", "fasta") for record in record_iterator: entry = str(">" + i.description + "\n" + i.seq) f1 = open("test.txt", "w") f1.write(entry) f1.close() f2 = open("test.txt", "r") blastp_cline = NcbiblastpCommandline(query = 'test.txt', db = 'nr -remote', evalue = 0.05, outfmt = '7 sseqid evalue qcovs pident') res = blastp_cline() blastp_results.append(res) f2.close()
I also tried using
NCBIWWW.qblast but it doesn't seem to provide
Query coverage information in the output, something which is important for my study.
Can somebody suggest a way to deal with this issue without compromising on search space or default parameters of BLAST? Suggestions on implementing BLAST in other languages such as PERL, R etc. would also be appreciated.