Blast job never stops
1
0
Entering edit mode
8 months ago
langziv ▴ 20

Hi.

Yesterday I started running a job with a script for blast. An output file was created. now, almost 24 hours later, the file is still empty, no other files were created, and the job keeps running. I think it won't stop unless I'll stop it. I was told that it might be due to a contig that's too long.

Maybe it's important to note that initially I couldn't keep the job running, since it kept failing due to memory limit exceeding. To overcome this I set a higher limit of 100gb. That's the highest I ever had to set.

The script:

#!/bin/bash
#PBS -q ...
#PBS -N ...
#PBS -e ...
#PBS -o ...
#PBS -l nodes=1:ppn=20,mem=100gb

cd /some/path/

blastx -query A1/scaffold.fa \
-db /root/BLAST/Proteins2/nr \
-max_hsps 1 -max_target_seqs 1 -num_threads 20 \
-out just_trying.txt \
-outfmt "6 std staxids qseqid sseqid staxids sscinames scomnames stitle"


Does anyone have an idea what to do?

blast linux • 240 views
1
Entering edit mode

not directly related to your issue but a few points to be aware of:

• be caution (or at least know very well what it implies) with using parameters as -max_hsps 1 and/or -max_target_seqs 1 , they can cause 'unexpected' results (google for it for details)

• Also: using up to 20 threads will likely not give much speed increase, blast is only for a small part parallelised and with 20 you for sure are on the plateau of speed increase (it has been said that anything above 4-5 threads is likely not adding much)

0
Entering edit mode

0
Entering edit mode

0
Entering edit mode

Log onto the cluster node your job has been allocated and check what's happening there (e.g. with top).

2
Entering edit mode
8 months ago
Mensur Dlakic ★ 12k

Files are written in chunks that correspond to sector sizes. BLAST will not have written enough output to go over a sector size until it completes a search with at least one sequence, as hits for each sequence are written at the end. As long as the search is still ongoing with the very first query sequences (assuming you have more than one), it is possible that there will be no output in that file.

There could be multiple explanations: your computer is slow (either objectively, or because it is shared with many other people); 100 Gb is still not enough so there is lots of disk swapping, which makes everything slow; you have a long query sequence that takes a while to search; all of the above.

I suggest you take a shorter query sequence and a smaller database, and run a test to make sure that everything works. Assuming it does, you may need more than 100Gb of memory assigned, a smaller database, or a faster computer. If all of that fails, you will need more patience.