Question

how to speed up blastx run on slurm HPC

0

Entering edit mode

3.6 years ago

Bioinfonext ▴ 460

Hi,

I am using below script for balstx against the ncbi nr database but it is taking so much time, and I got around 150000 sequences in fasta file? Is there any way to speed up the analysis?

#!/bin/bash
#SBATCH --job-name=nr      # Job name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL) 
#SBATCH --ntasks=40                  # Number of MPI ranks
#SBATCH --nodes=2                    # Number of nodes
#SBATCH --time=8000:00:00              # Time limit hrs:min:sec
#SBATCH --partition=k2-lowpri
#SBATCH --mem=200G

module load apps/ncbiblast/2.10.0/gcc-7.2.0

blastx -query sequecnes.fasta -db /mnt/scratch/ncbi/nr_protein/nr -evalue 1e-5 -outfmt "7 std stitle qseqid sseqid pident length mismatch gapopen qstart qend qcovs qlen slen" -max_target_seqs 1 -out nr.new.txt

Many thanks

BASH UNIX HPC • 2.1k views

ADD COMMENT • link updated 3.6 years ago by Mensur Dlakic ★ 27k • written 3.6 years ago by Bioinfonext ▴ 460

score 2 · Answer 1 · 2020-09-22

2

Entering edit mode

3.6 years ago

GenoMax 141k

By using a corresponding request in your blastx command line of -num_threads to match --ntasks. I would keep the threads on one node so change --nodes to 1.

ADD COMMENT • link 3.6 years ago by GenoMax 141k

0

Entering edit mode

thanks genomax,

so should I change the resources like this:

#!/bin/bash
#SBATCH --job-name=nr      # Job name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL) 
#SBATCH --ntasks=40                  # Number of MPI ranks
#SBATCH --nodes=1                    # Number of nodes
#SBATCH --time=8000:00:00              # Time limit hrs:min:sec
#SBATCH --partition=k2-lowpri
#SBATCH --mem=200G

many thanks

ADD REPLY • link 3.6 years ago by Bioinfonext ▴ 460

0

Entering edit mode

Correct. As long as individual nodes have 40 cores. If not drop that number to match what your nodes have. You can also used DIAMOND (LINK) as a faster alternative. You will need to create your own indexes though.

ADD REPLY • link 3.6 years ago by GenoMax 141k

score 1 · Answer 2 · 2020-09-22

1

Entering edit mode

3.6 years ago

Mensur Dlakic ★ 27k

Add -num_threads 40 to your blastx command.

ADD COMMENT • link 3.6 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

thank you so much! now I understand it.

ADD REPLY • link 3.6 years ago by Bioinfonext ▴ 460