Running Interproscan on the cluster
1
0
Entering edit mode
8.1 years ago
seta ★ 1.9k

Hi all friends,

I am trying to run Interproscan (interproscan-5.14-53.0) on the cluster for searching a protein sequence file against Pfam and ProDom databases. Our cluster is SGE, so I specified it in the "properties file" based on the manual of the program. Also, I changed the number of CPU from 2 (default) to 10 in the properties file. I submitted the program via the below script:

#!/bin/bash
#$ -S /bin/bash
#$ -N run_in
#$ -q ltime.q
#$ -cwd
#$ -o output_in.dat
#$ -j no
#$ -pe mpi 27
#$ -v OMP_NUM_THREADS=$NSLOTS

cd /home/seta/software/interproscan-5.14-53.0
./interproscan.sh -mode cluster -clusterrunid i567 -i prot_seq.pep -t p -o prot_out.xmxl -f xml -appl Pfam, ProDom -goterms -iprlookup

But when I checked the job, it sounds that defined CPU numbers (10 in the properties file or 27 in the script) is not used by the program. I'm very concerned about the running time. Please kindly tell me how I can reduce the time and run the job as fast as possible?

Thank you in advance

interproscan domain Pfam ProDomain • 2.8k views
ADD COMMENT
0
Entering edit mode
8.1 years ago
Michael 54k

This is somewhat normal, if I remember correctly. The number of CPUs is the maximum number that can be used, but is not used at 100% at all times. Not all programs that are run support parallel execution. If the sequences have been analysed before, there will be mostly network traffic for those sequences, because of the lookup service.

If you want to achieve higher average load you can experiment with specifying e.g. double the number of CPUs than you really have, in the interproscan config. Otherwise, if you need to estimate the run-time beforehand to reserve an appropriate slot in SGE, you can try to extrapolate the run time from a smaller sample.

ADD COMMENT

Login before adding your answer.

Traffic: 3039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6