Question: ABySS run problems
1
gravatar for Anand Rao
20 months ago by
Anand Rao210
United States
Anand Rao210 wrote:

I am using my university's HPC cluster to de novo assemble paired-end HiSeq 400 reads I've used Trimmomatic for adapter trimming and quality-based trimming I used BBmap's kmercountexact.sh to determine my preferred k-mer value for assembly Now I want to use ABySS for my de novo asssembly - the HPC uses SLURM scheduler

module load openmpi
Module openmpi/2.0.1 loaded 
module load abyss
Module abyss/1.9.0 loaded 

srun --partition=high --mem=24000 --time=12:00:00 --nodes=1 abyss-pe np=8 name=EthFoc-11_trimmomatic7_corr k=41 in='EthFoc-11_S285_L007_trimmomatic7_1P.00.0_0.cor.fastq EthFoc-11_S285_L007_trimmomatic7_2P.00.0_0.cor.fastq'

There is a lot of text in STDOUT - available at http://txt.do/d61vl Briefly, in STDOUT the last lines that indicate the the run was still working read as follows:

Assembling...
0: Assembled 51879929 k-mer in 81891 contigs.
Assembled 51879929 k-mer in 81891 contigs.
Concatenating fasta files to EthFoc-11_trimmomatic7_corr-1.fa
Concatenating fasta files to EthFoc-11_trimmomatic7_corr-bubbles.fa
Done.

But soon after this in the STDOUT, the run terminates with the following error message:

Concatenating fasta files to EthFoc-11_trimmomatic7_corr-1.fa
error: `contigs-0.fa': No such file or directory
make: *** [EthFoc-11_trimmomatic7_corr-1.fa] Error 1
srun: error: c11-96: task 0: Exited with exit code 2

I went through several biostars posts on ABySS run errors, but I don't think I have a direct solution to my problem... A: abyss mpirun non zero code, abyss-pe without openmpi, Error running Abyss with openMPI, Abyss-pe de-novo assembler error

Could it be a share access mis-configuration on the HPCC? - ABySS fails to write out coverage.hist file and stops

The files generated from this run are listed below:

-rw-rw-r-- 1 aksrao aksrao 42 Aug 18 00:01 EthFoc-11_trimmomatic7_corr-1.dot

-rw-rw-r-- 1 aksrao aksrao 0 Aug 17 23:16 EthFoc-11_trimmomatic7_corr-1.fa

-rw-rw-r-- 1 aksrao aksrao 1.3M Aug 17 23:15 EthFoc-11_trimmomatic7_corr-bubbles.fa

I wanted to test whether removing MPI from the equation will allow the run to completion, I tried

srun --partition=high --mem=24000 --time=12:00:00 --nodes=1 abyss-pe name=EthFoc-11_trimmomatic7_corr k=41 in='EthFoc-11_S285_L007_trimmomatic7_1P.00.0_0.cor.fastq EthFoc-11_S285_L007_trimmomatic7_2P.00.0_0.cor.fastq'

srun: job 13800791 queued and waiting for resources
srun: job 13800791 has been allocated resources

And the error I see happens almost right away, with the STDOUT looking as follows:

abyss-filtergraph  --dot   -k41 -g EthFoc-11_trimmomatic7_corr-2.dot1 EthFoc-11_trimmomatic7_corr-1.dot EthFoc-11_trimmomatic7_corr-1.fa >EthFoc-11_trimmomatic7_corr-1.path
abyss-filtergraph: ../Graph/DotIO.h:302: std::istream& read_dot(std::istream&, Graph&, BetterEP) [with Graph = DirectedGraph<ContigProperties, Distance>; BetterEP = DisallowParallelEdges; std::istream = std::basic_istream<char>]: Assertion `num_vertices(g) > 0' failed.
/bin/bash: line 1: 27510 Aborted                 abyss-filtergraph --dot -k41 -g EthFoc-11_trimmomatic7_corr-2.dot1 EthFoc-11_trimmomatic7_corr-1.dot EthFoc-11_trimmomatic7_corr-1.fa > EthFoc-11_trimmomatic7_corr-1.path
make: *** [EthFoc-11_trimmomatic7_corr-1.path] Error 134
make: *** Deleting file `EthFoc-11_trimmomatic7_corr-1.path'
srun: error: c11-91: task 0: Exited with exit code 2

What am I doing wrong? And how can I fix it? Since I am only a couple of days new to genome assembly and an hour into ABySS use, the more detailed your reply, the more useful it might be for me. Thanks!

ADD COMMENTlink modified 20 months ago by benv710 • written 20 months ago by Anand Rao210

Is the following requirement satisfied in your files?

A pair of reads must be named with the suffixes /1 and /2 to identify the first and second read, or the reads may be named identically. The paired reads may be in separate files or interleaved in a single file.

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax65k

You can also post your question to the abyss user group.

ADD REPLYlink written 20 months ago by st.ph.n2.4k
3
gravatar for benv
20 months ago by
benv710
Canada
benv710 wrote:

Hi Anand,

I'm not sure what the problem is, but I can hopefully provide some hints.

It looks like your abyss-pe command is correct. I suspect your problems are related your cluster job submission parameters. Learning to run MPI jobs on a cluster usually requires a bit of experimenting with job submission flags. If you have an IT department, you should ask them if they have any example scripts showing how to run MPI jobs on your cluster. Also, I would recommend first testing that you can successfully run a simple MPI program before trying ABySS. For example, this page provides a MPI "Hello, World!" program: https://hpcc.usc.edu/support/documentation/examples-of-mpi-programs/. You would have to paste the code into a file and compile it yourself, but it probably worth the effort.

In the log of a successful ABySS run, you should see multiple ABYSS-P processes ("ranks" in MPI terminology) running in parallel and communicating with each other. The processes can be running on different cluster nodes. Each MPI process (rank) writes its own temporary contig-<rank>.fa file, so if you are running a job with 8 processes (np=8), you would expect to see the following in your assembly directory:

contig-0.fa
contig-1.fa
contig-2.fa
contig-3.fa
contig-4.fa
contig-5.fa
contig-6.fa
contig-7.fa

When the parallel runs of ABYSS-P finish, these files are concatenated together into a single FASTA file and then removed.

At the beginning of a ABYSS-P log for np=8, you should see something like:

0: Running on host c11-96
1: Running on host c11-96
2: Running on host c11-96
3: Running on host c11-96
4: Running on host c11-96
5: Running on host c11-96
6: Running on host c11-96
7: Running on host c11-96

whereas in http://textuploader.com/d61vl, you are just seeing:

0: Running on host c11-96

in multiple runs of ABYSS-P.

It is appears that multiple independent ABySS jobs are being started rather than a single job with 8 processes ("ranks"). If they are all running in the same directory, they will overwrite each other's contig-0.fa files. Also, each independent job will delete its contig-0.fa file once the concatenating step is finished, which is likely why you are seeing:

error: `contigs-0.fa': No such file or directory
ADD COMMENTlink modified 20 months ago • written 20 months ago by benv710

Thank you benv, I will look into syntax for MPI jobs in general on our HPCC, and specifically for ABySS.

BTW, what is the location of this assembly directory? Is it in pwd with input files (that I have permissions to write to) or in abyss executable directory common to all HPCC users? (that I do not have permissions for) - thought it does not hurt to ask.

If latter, then I need to redirect / rename assembly directory so that permission is not a problem. But looks like the more likely problem is what you've outlined above as the difference between observed Vs. expected ABYSS-P log. I'll update us after this weekend.

ADD REPLYlink modified 20 months ago • written 20 months ago by Anand Rao210

The assembly directory is the working directory for your cluster job, which is usually just the directory where you ran your job submission command (i.e. sbatch/srun). If you're not sure what directory your job is running in, you can just put a pwd command at the top of your job script.

I agree, it doesn't look like a file permissions problem.

Good luck!

ADD REPLYlink modified 20 months ago • written 20 months ago by benv710

I am deleting my comment here and posting it as a new thread at Running ABySS at k-mer > 97.

ADD REPLYlink modified 20 months ago • written 20 months ago by Anand Rao210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1583 users visited in the last hour