BLASTX script doesn't print results and never ends
1
0
Entering edit mode
5 weeks ago
langziv ▴ 20

Hi.

I ran this script before and it worked fine. Maybe there's a small change that causes this. Also, there's no error message.

Here's the script:

#!/bin/bash
#PBS -N blastx
#PBS -e /err_and_out_files/blastx.ER
#PBS -o /err_and_out_files/blastx.OU
#PBS -l nodes=compute-0-311:ppn=20,mem=100gb
export BLASTDB="/bioseq/biodb/BLAST/Proteins2/taxdb"

blastx -query /output/fasta_files/btcaA1_filtered.fa -db /bioseq/biodb/BLAST/Proteins2/nr -max_hsps 1 -max_target_seqs 10 -num_threads 4 -evalue 1e-5 -out /output/blast/blastx/btcaA1_filtered.txt -outfmt "6 qseqid sseqid pident staxids sskingdoms qstart qend qlen length sstart send slen evalue mismatch gapopen bitscore stitle"


Thanks!

blastx command-line • 339 views
1
Entering edit mode
5 weeks ago
Mensur Dlakic ★ 11k

Chances are that you haven't waited long enough because I am guessing that your query file is large, and it is obvious that you have a large database. Simply try the same command with a smaller database such as SwissProt. If that works, you will need more patience, or use a database where the redundancy is removed above a certain threshold (say, UniProt90).

0
Entering edit mode

Just realized that a near-identical question of yours was already answered here. Since that answer was accepted, I assumed that it solved your problem.

0
Entering edit mode

Thank you Mensur Dlakic. Since you mentioned SwissProt, would you use both NCBI's database and SwissProt for XBLAST? Maybe it's a good idea to have multiple databases, in case they are trustworthy.

1
Entering edit mode

SwissProt is a curated database that includes protein of known function and reliable annotation. It has less than million sequences if I remember correctly, and it is not meant for large scale searching. Besides, all of its sequences are already included in the nr database. I suggested it to you as a quick way of checking whether your software and hardware setup is correct, because the search should be done in less than 1% of time it takes to do nr. UniProt90, on the other hand, is a good substitute for nr in my opinion, and is about 40% of the nr size.

0
Entering edit mode

From your experience, is it normal that blastx run would last multiple days when running against blast's proteins database, and the input fasta file consists of a single sequence, the length of which is 10,368 base pairs, while there's no output written, or is that indicative of something not working?

1
Entering edit mode

0
Entering edit mode

I though everything was explained in my previous answer, but I will try again.

You seem to be using a shared computer and running this through some kind of batch submission system. It is not normal for a blastx run on a single sequence to take multiple days, but it could be that your system is slowly reading the database because of swapping, or because of high load. Or it could be that something is wrong with your programs and/or database setup. That is why I suggested that you try SwissProt because it is a small fraction of the nr database. If a search against SwissProt is not done in a matter of minutes, it would hopefully tell you is it a matter of a slow computer system or a wrong software setup.