7.1 years ago by
Seattle, WA USA
Following up on Mikael's answer, you may have reasons for doing what you're doing. If you have access to a computational cluster, you might look into compiling
meme_p, a variant of
meme that incorporates OpenMPI components to spread out the work on multiple nodes.
You might build it like so:
$ cd /home/foo/meme_4.9.0
$ ./configure \
You need to add the OpenMPI
lib path to your
LD_LIBRARY_PATH environment variable, _e.g._
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi-1.6.3/lib etc. in your environment setup. Your OpenMPI installation must also be present on or available to each cluster node.
A Sun Grid Engine-based script called
runall.cluster would fire off the search as follows, supposing your sys admin has set up a parallel environment called
mpi_pe (for example) with at least 64 slots:
#$ -N memeCluster64
#$ -S /bin/bash
#$ -pe mpi_pe 64
#$ -v -np=64
#$ -o "memeCluster64.out"
#$ -e "memeCluster64.err"
time /opt/openmpi-1.6.3/bin/mpirun \
-np 64 \
-oc /home/foo/meme_4.9.0/output/myReads.fa.meme \
-nmotifs 30 \
-maxsize 100000000 \
To run it:
$ qsub ./runall.cluster
In our environment, testing showed immediate benefit with as few as 8 or 16 nodes, with diminishing returns after about 32-64 nodes. You could use GNU
time to do the same runtime testing on your end, i.e., measuring execution time vs nodes on a small test sequence set, in order to find a "sweet spot" where your job will run faster without taking up too much of the cluster.