Entering edit mode
3.6 years ago
Bioinfonext
▴
460
Hi,
I am using below script for balstx against the ncbi nr database but it is taking so much time, and I got around 150000 sequences in fasta file? Is there any way to speed up the analysis?
#!/bin/bash
#SBATCH --job-name=nr # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --ntasks=40 # Number of MPI ranks
#SBATCH --nodes=2 # Number of nodes
#SBATCH --time=8000:00:00 # Time limit hrs:min:sec
#SBATCH --partition=k2-lowpri
#SBATCH --mem=200G
module load apps/ncbiblast/2.10.0/gcc-7.2.0
blastx -query sequecnes.fasta -db /mnt/scratch/ncbi/nr_protein/nr -evalue 1e-5 -outfmt "7 std stitle qseqid sseqid pident length mismatch gapopen qstart qend qcovs qlen slen" -max_target_seqs 1 -out nr.new.txt
Many thanks
thanks genomax,
so should I change the resources like this:
many thanks
Correct. As long as individual nodes have 40 cores. If not drop that number to match what your nodes have. You can also used
DIAMOND
(LINK) as a faster alternative. You will need to create your own indexes though.