Question: Running Mpiblast Strange Errors
0
gravatar for xapple
5.8 years ago by
xapple230
UU
xapple230 wrote:

I've decided to use MPIBLAST on the cluster at my local university to operate some similarity searches between sequences. This is practical because individual nodes on the cluster do not have enough RAM to hold all of the reference sequences. Since mpiblast divides the reference sequences across nodes (instead of dividing the query sequences), this is a perfect solution to avoid hitting swap space.

But the thing is I am unable to get it to work. If you have any ideas on how to correct the error I'm getting, I would be very grateful. Here is how it is setup. First, I put this in my ~/.ncbirc file:

[NCBI]
Data=/bubo/sw/apps/bioinfo/blast/2.2.24/data/
### Data - this is where blast grabs the scoring matrices,
### any "data" dir in the blast releases on kalkyl should do fine

[BLAST]
BLASTDB=/bubo/nobackup/uppnex/blast_databases
BLASTMAT=/bubo/sw/apps/bioinfo/blast/2.2.24/data/
### BLASTDB - you can just run blastall with the -d option, but if you want to use
### specific database, you can give a directory here.
### Note that those databases do not work with mpiblast, though.
### BLASTMAT - is in 99% of use cases the same as the "Data" above, where matrices are stored.

[mpiBLAST]
Shared=/bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb
Local=/bubo/home/h3/lucass/glob/test/mpiblast/local
### Shared - is some dir where you want to read/write database files, typically somewhere under your glob
### Local - is a any dir readable by the nodes.

Then I generated a test database like this (I'm simply taking the start of the nt database):

$ cd /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb
$ head -100000 /bubo/nobackup/uppnex/blast_databases/fasta/nt.fasta > sequences.fasta
$ mpiformatdb -i sequences.fasta --nfrags 22 -p F

This operation completes successfully. Finally I submitted this SLURM script with sbatch to query the database on three nodes (each node has eight processors):

#!/bin/bash -l
#SBATCH -D /bubo/home/h3/lucass/glob/test/mpiblast/query
#SBATCH -J test_mpiblast
#SBATCH -o test_mpiblast.out
#SBATCH -t 15:00
#SBATCH -p node -n 24

# Modules #
module load mpiblast

# Make test query #
head -4 /bubo/nobackup/uppnex/blast_databases/fasta/nt.fasta > query.fasta

# Run BLAST #
mpirun -np 24 mpiblast -p blastn -d reference.fasta -i query.fasta -o query.xml -b 7

The error outputted in the standard out is the following:

mod: loaded OpenMPI 1.4.5, compiled with gcc4.6 (found in /opt/openmpi/1.4.5gcc4.6/)
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nhr /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nhr
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nhr
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nhr
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nin /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nin
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nin
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nin
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsq /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsq
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsq
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsq
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nnd /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nnd
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nnd
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nnd
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nni /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nni
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nni
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nni
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsd /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsd
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsd
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsd
ret_value = 32512
-------SNIP---------
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.013.nsi /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.013.nsi
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.013.nsi
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.013.nsi
ret_value = 32512
[15]                1.114980    (15) unable to copy fragment!
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsd /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsd
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsd
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsd
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsi /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsi
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsi
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsi
ret_value = 32512
[9]             1.115341    (9) unable to copy fragment!
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 14 in communicator MPI_COMM_WORLD 
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 14 with PID 12625 on
node q175 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
0   1   1.18088 Bailing out with signal 151.17914   Bailing out with signal 15

2   4   51.18093    Bailing out with signal 151.18096   Bailing out with signal 15
    1.17917 Bailing out with signal 153

    1.17923 Bailing out with signal 15
6   1.18096 Bailing out with signal 15
7   1.18104 Bailing out with signal 15
[q164.uppmax.uu.se:15654] 10 more processes have sent help message help-mpi-api.txt / mpi-abort
[q164.uppmax.uu.se:15654] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

It seems very strange that a simple copy command would fail. Trying to execute the same copy commands on the shell works fine, even when logged into one of the nodes on the cluster. Any ideas ?

• 2.7k views
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by xapple230
3
gravatar for xapple
5.8 years ago by
xapple230
UU
xapple230 wrote:

OK, this was fairly simple in the end. There is a --copy-via=mpi option to add. Like this:

 mpirun -np 24 mpiblast --copy-via=mpi -p blastn -d sequences.fasta -i query.fasta -o query.xml -b 7
ADD COMMENTlink written 5.8 years ago by xapple230
1

well, it is not fair, i can upvote your answer though.

ADD REPLYlink written 5.8 years ago by Pavel Senin1.9k
1

I agree that voting on one's own post is not fair. But if you look at stackexchange you can always accept your own answer.

ADD REPLYlink written 5.7 years ago by xapple230

Apparently, I can't accept my own answer ! : (

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by xapple230
0
gravatar for Pavel Senin
5.8 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

I would check for the file and partition sharing permissions (maybe it is mounted for the current user only?)

ADD COMMENTlink written 5.8 years ago by Pavel Senin1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour