I have a fasta file including 140 protein sequences from distinct viruses and I would like to identify which protein comes from which virus.
I am using a Linux cluster, BLAST is available as a cluster module, and the viruses and NCBI nr databases are stored in my own directory(correct me if I used the wrong terminology) in the cluster.
I set up my blastp as below:
blastp -db nr -query proteins.fa -outfmt 6 -out ./output.txt -num_threads 10 -max_target_seqs 1
and requested the resources from cluster as:
#PBS -l mem=64gb,nodes=10:ppn=1,walltime=10:00:00
It has been running for around 10 hours and I haven’t got any results written in the
output.txt. I am wondering if there is a better way to set up RAM, nodes, or process per node to speed up BLASTp run. Thank you so much!
Here is the info about the Linux cluster:
66 compute nodes. Each node has two 14-core Intel processors (2.40GHz) sharing 128 GB of memory.