BLASTX script doesn't print results and never ends
1
0
Entering edit mode
10 weeks ago
langziv ▴ 20

Hi.

I ran this script before and it worked fine. Maybe there's a small change that causes this. Also, there's no error message.

Here's the script:

#!/bin/bash
#PBS -N blastx
#PBS -e /err_and_out_files/blastx.ER
#PBS -o /err_and_out_files/blastx.OU
#PBS -l nodes=compute-0-311:ppn=20,mem=100gb
export BLASTDB="/bioseq/biodb/BLAST/Proteins2/taxdb"
module load blast/blast-2.10.0

blastx -query /output/fasta_files/btcaA1_filtered.fa -db /bioseq/biodb/BLAST/Proteins2/nr -max_hsps 1 -max_target_seqs 10 -num_threads 4 -evalue 1e-5 -out /output/blast/blastx/btcaA1_filtered.txt -outfmt "6 qseqid sseqid pident staxids sskingdoms qstart qend qlen length sstart send slen evalue mismatch gapopen bitscore stitle"

Thanks!

blastx command-line • 377 views
ADD COMMENT
1
Entering edit mode
10 weeks ago
Mensur Dlakic ★ 12k

Chances are that you haven't waited long enough because I am guessing that your query file is large, and it is obvious that you have a large database. Simply try the same command with a smaller database such as SwissProt. If that works, you will need more patience, or use a database where the redundancy is removed above a certain threshold (say, UniProt90).

ADD COMMENT
0
Entering edit mode

Just realized that a near-identical question of yours was already answered here. Since that answer was accepted, I assumed that it solved your problem.

ADD REPLY
0
Entering edit mode

Thank you Mensur Dlakic. Since you mentioned SwissProt, would you use both NCBI's database and SwissProt for XBLAST? Maybe it's a good idea to have multiple databases, in case they are trustworthy.

ADD REPLY
1
Entering edit mode

SwissProt is a curated database that includes protein of known function and reliable annotation. It has less than million sequences if I remember correctly, and it is not meant for large scale searching. Besides, all of its sequences are already included in the nr database. I suggested it to you as a quick way of checking whether your software and hardware setup is correct, because the search should be done in less than 1% of time it takes to do nr. UniProt90, on the other hand, is a good substitute for nr in my opinion, and is about 40% of the nr size.

ADD REPLY
0
Entering edit mode

From your experience, is it normal that blastx run would last multiple days when running against blast's proteins database, and the input fasta file consists of a single sequence, the length of which is 10,368 base pairs, while there's no output written, or is that indicative of something not working?

ADD REPLY
1
Entering edit mode

Please do not delete posts that have received feedback.

ADD REPLY
0
Entering edit mode

I though everything was explained in my previous answer, but I will try again.

You seem to be using a shared computer and running this through some kind of batch submission system. It is not normal for a blastx run on a single sequence to take multiple days, but it could be that your system is slowly reading the database because of swapping, or because of high load. Or it could be that something is wrong with your programs and/or database setup. That is why I suggested that you try SwissProt because it is a small fraction of the nr database. If a search against SwissProt is not done in a matter of minutes, it would hopefully tell you is it a matter of a slow computer system or a wrong software setup.

ADD REPLY

Login before adding your answer.

Traffic: 2327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6