SPAdes hybrid assembly bailing at last step - Error code -10?
0
1
Entering edit mode
5.7 years ago
kevbonham ▴ 10

I'm attempting to do de novo hybrid assemblies with paired-end illumina data and PacBio long-read data using SPAdes (3.5.0 Darwin, on OSX 10.10.3). I have 11 different bacteria species, all of which were sequenced the same way at the same time. For some of these species, the assembler runs through perfectly (using default parameters), so I believe I've got it set up correctly, but for some of them, it runs through most of the process and then spits an error code:

== Error ==  system call for: "['home/path/SPAdes-3.5.0-Darwin/bin/spades', 'project/path/K55/configs/config.info']" finished abnormally, err code: -10

This occurs near the end of the k55 assembly step. I can't find anything obviously different between the samples that work and the samples that don't, and can't find any documentation on this particular error code. Can anyone help figure out what the problem is/figure out a fix?

Assembly genome software error • 4.0k views
1
Entering edit mode

I don't remember if I encountered this error before, but for me often times SPAdes will stop without completing the assembly but, after running again with the --continue parameter it will finish successfully (sometimes I have to run with --continue more than once). It is worth a try.

Though I never asked questions about SPAdes, I mailed the authors with questions regarding other software from their group and always got helpful replies, it is another thing you should try.

0
Entering edit mode

Cool - I'm giving this a shot now. I also e-mailed the authors, and they asked about how much RAM I have. Strange that I'd run into memory issues with some assemblies and not others, but we'll see what they say. I'm trying the run again with --continue with nothing but my browser open. *fingers crossed*

1
Entering edit mode

It is expected different datasets will require different amounts of memory, even if input data are the same size. There are lots of factors involved in how much memory will be required, as genome size, genome (or sample) complexity, quality of sequencing, coverage, adapters contamination, an others. Maybe meaningful for your case are 1. did you check for adapter contamination? and 2. are you sure you do not have more than one species on one sample (contamination)?

How much memory do you have? I had some runs which were above 16Gb memory usage.

0
Entering edit mode

A fair point. I've got 16GB of memory. I can't be 100% sure I don't have sample contamination, but I was careful. How would you check for adaptor contamination? The sequencing facility we used did some pre-processing of the data (sequence runs were multiplexed, so they sorted by barcode etc), and I assumed they checked, but it wouldn't hurt to check.

In case it's memory, I managed to get our HPCC to install SPAdes, so I'll upload the files and see if that works - I can allocate quite a bit more memory there.

0
Entering edit mode

I did not test much, but I found MGA to be most sensitive program for detecting adapter contamination. You could also use BBDuk and just remove adapters, without any other filter (quality, length), at the end it will tell you how many reads were removed due to adapter contamination and your reads will be clean.

0
Entering edit mode

Alas, no success - ran it twice more, closed every other program, still bails at the end with same error code