Question: SPAdes hybrid assembly bailing at last step - Error code -10?
gravatar for kevbonham
4.1 years ago by
United States
kevbonham10 wrote:

I'm attempting to do de novo hybrid assemblies with paired-end illumina data and PacBio long-read data using SPAdes (3.5.0 Darwin, on OSX 10.10.3). I have 11 different bacteria species, all of which were sequenced the same way at the same time. For some of these species, the assembler runs through perfectly (using default parameters), so I believe I've got it set up correctly, but for some of them, it runs through most of the process and then spits an error code:

`== Error ==  system call for: "['home/path/SPAdes-3.5.0-Darwin/bin/spades', 'project/path/K55/configs/']" finished abnormally, err code: -10`

This occurs near the end of the k55 assembly step. I can't find anything obviously different between the samples that work and the samples that don't, and can't find any documentation on this particular error code. Can anyone help figure out what the problem is/figure out a fix?

software error assembly genome • 3.3k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by kevbonham10

I don't remember if I encountered this error before, but for me often times SPAdes will stop without completing the assembly but, after running again with the --continue parameter it will finish successfully (sometimes I have to run with --continue more than once). It is worth a try.

Though I never asked questions about SPAdes, I mailed the authors with questions regarding other software from their group and always got helpful replies, it is another thing you should try.

ADD REPLYlink written 4.1 years ago by h.mon27k

Cool - I'm giving this a shot now. I also e-mailed the authors, and they asked about how much RAM I have. Strange that I'd run into memory issues with some assemblies and not others, but we'll see what they say. I'm trying the run again with `--continue` with nothing but my browser open. *fingers crossed*

ADD REPLYlink written 4.1 years ago by kevbonham10

It is expected different datasets will require different amounts of memory, even if input data are the same size. There are lots of factors involved in how much memory will be required, as genome size, genome (or sample) complexity, quality of sequencing, coverage, adapters contamination, an others. Maybe meaningful for your case are 1. did you check for adapter contamination? and 2. are you sure you do not have more than one species on one sample (contamination)?

How much memory do you have? I had some runs which were above 16Gb memory usage.

ADD REPLYlink written 4.1 years ago by h.mon27k

A fair point. I've got 16GB of memory. I can't be 100% sure I don't have sample contamination, but I was careful. How would you check for adaptor contamination? The sequencing facility we used did some pre-processing of the data (sequence runs were multiplexed, so they sorted by barcode etc), and I assumed they checked, but it wouldn't hurt to check.

In case it's memory, I managed to get our HPCC to install SPAdes, so I'll upload the files and see if that works - I can allocate quite a bit more memory there.

ADD REPLYlink written 4.1 years ago by kevbonham10

I did not test much, but I found MGA to be most sensitive program for detecting adapter contamination. You could also use BBDuk and just remove adapters, without any other filter (quality, length), at the end it will tell you how many reads were removed due to adapter contamination and your reads will be clean.



ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by h.mon27k

Alas, no success - ran it twice more, closed every other program, still bails at the end with same error code

ADD REPLYlink written 4.1 years ago by kevbonham10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 539 users visited in the last hour