I'm attempting to do de novo hybrid assemblies with paired-end illumina data and PacBio long-read data using SPAdes (3.5.0 Darwin, on OSX 10.10.3). I have 11 different bacteria species, all of which were sequenced the same way at the same time. For some of these species, the assembler runs through perfectly (using default parameters), so I believe I've got it set up correctly, but for some of them, it runs through most of the process and then spits an error code:
== Error == system call for: "['home/path/SPAdes-3.5.0-Darwin/bin/spades', 'project/path/K55/configs/config.info']" finished abnormally, err code: -10
This occurs near the end of the k55 assembly step. I can't find anything obviously different between the samples that work and the samples that don't, and can't find any documentation on this particular error code. Can anyone help figure out what the problem is/figure out a fix?
I don't remember if I encountered this error before, but for me often times SPAdes will stop without completing the assembly but, after running again with the
--continue
parameter it will finish successfully (sometimes I have to run with--continue
more than once). It is worth a try.Though I never asked questions about SPAdes, I mailed the authors with questions regarding other software from their group and always got helpful replies, it is another thing you should try.
Cool - I'm giving this a shot now. I also e-mailed the authors, and they asked about how much RAM I have. Strange that I'd run into memory issues with some assemblies and not others, but we'll see what they say. I'm trying the run again with
--continue
with nothing but my browser open. fingers crossedIt is expected different datasets will require different amounts of memory, even if input data are the same size. There are lots of factors involved in how much memory will be required, as genome size, genome (or sample) complexity, quality of sequencing, coverage, adapters contamination, an others. Maybe meaningful for your case are 1. did you check for adapter contamination? and 2. are you sure you do not have more than one species on one sample (contamination)?
How much memory do you have? I had some runs which were above 16Gb memory usage.
A fair point. I've got 16GB of memory. I can't be 100% sure I don't have sample contamination, but I was careful. How would you check for adaptor contamination? The sequencing facility we used did some pre-processing of the data (sequence runs were multiplexed, so they sorted by barcode etc), and I assumed they checked, but it wouldn't hurt to check.
In case it's memory, I managed to get our HPCC to install SPAdes, so I'll upload the files and see if that works - I can allocate quite a bit more memory there.
I did not test much, but I found MGA to be most sensitive program for detecting adapter contamination. You could also use BBDuk and just remove adapters, without any other filter (quality, length), at the end it will tell you how many reads were removed due to adapter contamination and your reads will be clean.
Alas, no success - ran it twice more, closed every other program, still bails at the end with same error code