Question

Denovo assembly SPAdes ERROR K-mer Counting: too many k-mers to fit into available memory limit

3

Entering edit mode

7.2 years ago

ambi1999 ▴ 50

Hi,

I am getting following error while doing denovo assembly using SPAdes on a linux with 15 GB RAM and more than 50GB space left in hard disk. The size of two fastq files being used as input is about 2.4 GB and 2 GB.

ERROR K-mer Counting The reads contain too many k-mers to fit into available memory limit. Increase memory limit and restart

Spades.log:

Command line: spades.py --careful -o WT_ -1 firstfile.fq -2 secondfile.fq -m 10

System information:
  SPAdes version: 3.5.0
  Python version: 2.7.12
  OS: Linux-4.4.0-59-generic-x86_64-with-Ubuntu-16.04-xenial

Output dir: SPAdes-3.5.0-Linux/bin/WT_
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['firstfile.fq']
      right reads: ['secondfile.fq']
      interlaced reads: not specified
      single reads: not specified
Read error correction parameters:
  Iterations: 1
  PHRED offset will be auto-detected
  Corrected reads will be compressed (with gzip)
Assembly parameters:
  k: automatic selection based on read length
  Mismatch careful mode is turned ON
  Repeat resolution is enabled
  MismatchCorrector will be used
  Coverage cutoff is turned OFF
Other parameters:
  Dir for temp files: tmp
  Threads: 16
  Memory limit (in Gb): 10


======= SPAdes pipeline started. Log can be found here: SPAdes-3.5.0-Linux/bin/WT_/spades.log


===== Read error correction started. 


== Running read error correction tool: SPAdes-3.5.0-Linux/bin/hammer SPAdes-3.5.0-Linux/bin/WT_/corrected/configs/config.info

   0:00:00.000    4M /    4M   INFO  General                 (main.cpp                  :  82)   Loading config from SPAdes-3.5.0-Linux/bin/WT_/corrected/configs/config.info
   0:00:00.000    4M /    4M   INFO  General                 (memory_limit.hpp          :  42)   Memory limit set to 10 Gb
   0:00:00.001    4M /    4M   INFO  General                 (main.cpp                  :  91)   Trying to determine PHRED offset
   0:00:00.001    4M /    4M   INFO  General                 (main.cpp                  :  97)   Determined value is 33
   0:00:00.002    4M /    4M   INFO  General                 (hammer_tools.cpp          :  36)   Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ]
     === ITERATION 0 begins ===
   0:00:00.002    4M /    4M   INFO K-mer Index Building     (kmer_index.hpp            : 467)   Building kmer index
   0:00:00.002    4M /    4M   INFO K-mer Splitting          (kmer_data.cpp             : 127)   Splitting kmer instances into 128 buckets. This might take a while.
   0:00:00.002    4M /    4M   INFO  General                 (file_limit.hpp            :  29)   Open file limit set to 1024
   0:00:00.002    4M /    4M   INFO K-mer Splitting          (kmer_data.cpp             : 145)   Memory available for splitting buffers: 0.416504 Gb
   0:00:00.002    4M /    4M   INFO K-mer Splitting          (kmer_data.cpp             : 153)   Using cell size of 436736
   0:00:00.857    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 167)   Processing firstfile.fq
   0:00:18.381    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 813597 reads
   0:00:38.048    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 1673452 reads
   0:00:57.634    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 2519299 reads
   0:01:16.964    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 3305418 reads
   0:01:37.462    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 4168421 reads
   0:01:50.922    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 4493764 reads
   0:01:50.922    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 167)   Processing secondfile.fq
   0:02:08.666    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 5591263 reads
   0:02:35.935    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 6636651 reads
   0:03:20.730    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 8752362 reads
   0:03:25.466    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 181)   Processed 8987528 reads
   0:03:25.620   32M /    3G   INFO  General                 (kmer_index.hpp            : 345)   Starting k-mer counting.
   0:03:53.603   32M /    3G   INFO  General                 (kmer_index.hpp            : 351)   K-mer counting done. There are 418968448 kmers in total.
   0:03:53.603   32M /    3G   INFO  General                 (kmer_index.hpp            : 353)   Merging temporary buckets.
   0:04:11.857   32M /    3G   INFO K-mer Index Building     (kmer_index.hpp            : 476)   Building perfect hash indices
   0:06:34.813  160M /    7G   INFO  General                 (kmer_index.hpp            : 371)   Merging final buckets.
   0:06:50.161  160M /    7G   INFO K-mer Index Building     (kmer_index.hpp            : 515)   Index built. Total 144936940 bytes occupied (2.7675 bits per kmer).
   0:06:50.161  160M /    7G  ERROR K-mer Counting           (kmer_data.cpp             : 261)   The reads contain too many k-mers to fit into available memory limit. Increase memory limit and restart


== Error ==  system call for: "['SPAdes-3.5.0-Linux/bin/hammer', 'SPAdes-3.5.0-Linux/bin/WT_/corrected/configs/config.info']" finished abnormally, err code: 255

In case you have troubles running SPAdes, you can write to spades.support@bioinf.spbau.ru
Please provide us with params.txt and spades.log files from the output directory.

params.txt

Command line: spades.py --careful -o WT_ -1 firstfile.fq -2 secondfile.fq -m 10

System information:
  SPAdes version: 3.5.0
  Python version: 2.7.12
  OS: Linux-4.4.0-59-generic-x86_64-with-Ubuntu-16.04-xenial

Output dir: SPAdes-3.5.0-Linux/bin/
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['firstfile.fq']
      right reads: ['secondfile.fq']
      interlaced reads: not specified
      single reads: not specified
Read error correction parameters:
  Iterations: 1
  PHRED offset will be auto-detected
  Corrected reads will be compressed (with gzip)
Assembly parameters:
  k: automatic selection based on read length
  Mismatch careful mode is turned ON
  Repeat resolution is enabled
  MismatchCorrector will be used
  Coverage cutoff is turned OFF
Other parameters:
  Dir for temp files: SPAdes-3.5.0-Linux/bin/WT_/tmp
  Threads: 16
  Memory limit (in Gb): 10

Thx for the help.

Cheers, Ambi.

denovo assembly spades • 9.7k views

ADD COMMENT • link updated 2.2 years ago by Dũng • 0 • written 7.2 years ago by ambi1999 ▴ 50

2

Entering edit mode

Hi Brian,

I have now managed to do denovo on my local machine by first normalising using your bbnorm package. It required far less memory (around 6 GB as compared to 15 GB without normalization).

Many thanks for recommending the bbnorm. I just noted that you are developer of this package as well which is great a contribution to the community. Very well done and thanks once again.

ps: got another question regarding this project but will post it as separate question as it is not related to denovo.

Cheers, Ambi.

ADD REPLY • link 7.2 years ago by ambi1999 ▴ 50

0

Entering edit mode

Hi Ambi,

I'm happy BBNorm was helpful in this case! To close this thread as resolved, if you feel that is has in fact been resolved, please accept the answer.

Thanks, Brian

ADD REPLY • link 7.2 years ago by Brian Bushnell 20k

0

Entering edit mode

just accepted the answer. Thx again for your help.

Ambi.

ADD REPLY • link 7.2 years ago by ambi1999 ▴ 50

1

Entering edit mode

7.2 years ago

shenwei356 8.4k

ERROR K-mer Counting The reads contain too many k-mers to fit into available memory limit. Increase memory limit and restart

Command line: spades.py --careful -o WT_ -1 firstfile.fq -2 secondfile.fq -m 10

Choose smaller max-kmer-size (-k) or switch to a server with more RAM and disk space.

Here's mine for PE150 reads of bacteria sequencing

spades.py -k 21,33,55,77 -t 12 -m 50 --careful -o {}/spades -1 {}/{%}_1.fq.gz -2 {}/{%}_2.fq.gz

ADD COMMENT • link 7.2 years ago by shenwei356 8.4k

score 7 · Accepted Answer · 2017-02-18

7

Entering edit mode

7.2 years ago

Brian Bushnell 20k

15GB RAM is tiny for most purposes of genome assembly. Also, you're using an old version of Spades; newer versions are more memory-efficient, so you should upgrade.

If for some reason you only have 15GB RAM available, you can greatly reduce the amount of time and memory Spades will need using the BBMap package to discard low-abundance kmers. Generally, I'd recommend a pipeline something like this (specifics would require much more information about your data):

#Trim adapters
bbduk.sh in=r1.fq in2=r2.fq out=trimmed.fq ktrim=r k=23 mink=11 hdist=1 ref=/bbmap/resources/adapters.fa tbo tpe

#Normalize
bbnorm.sh in=trimmed.fq out=normalized.fq target=100 min=5

#Assemble
spades.py -k 21,41,71,101,127 -o out -12 normalized.fq

As shenwei indicated, you can also reduce memory consumption by using shorter kmers. But if you are actually interested in getting the best possible assembly, you shouldn't do that; rather, you should find a computer with more memory.

ADD COMMENT • link 7.2 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi,

Thx Brian and shenwei. I have now upgraded to spade 3.10 version but getting the same memory problem. I will try to use BBMap later today. Do you know of any free servers (such as CLIMB for Microbial genomics http://www.climb.ac.uk/) available for University research projects to analyse ngs data?

I fully understand that I need to move to a higher spec machine but just to understand the memory issue I had a look at the system monitor (screen shot attached) while running SPAdes and it only using about 50% of available RAM max. Also the 16 GB of swap space that I have got is almost unused. Does it mean that SPAdes pre calculates the amount of RAM memory required and throws error if enough memory is not available without having a go at running with whatever RAM is available and using swap space?

Thx.

Cheers, Ambi.

system monitor image

ADD REPLY • link 7.2 years ago by ambi1999 ▴ 50

0

Entering edit mode

You used the flag "-m 10" which tells Spades to crash if it exceeds 10GB. Try running without that flag. Also, swap is not going to be very useful for genome assembly; it's just too slow.

ADD REPLY • link 7.2 years ago by Brian Bushnell 20k

0

Entering edit mode

Thx Brian. Tried running withouth -m flag and with -m 15 but it still crashes because of low memory. I will try bbmap soon.

Meanwhile what sort of configuration you would recommend for genome assembly incase I can manage some funding for a new local machine? I know higher spec will be better but is there some spec which is recommended?

Thx again, Ambi.

ADD REPLY • link 7.2 years ago by ambi1999 ▴ 50

0

Entering edit mode

It's also possible to do this stuff in the cloud, of course; that way if you run out of memory, you can just rent a bigger computer. For reference, our smallest nodes for assembly are 128 GB. Bacteria are tiny so they normally don't need that much - 32GB is normally adequate - though a fungus might. But contamination can massively increase the amount of memory needed.

If you are planning to buy a computer dedicated to bioinformatics - assembly, mapping, etc - I can hardly recommend getting one smaller than 128 GB. Also, more cores is better, but memory is more important than cores. 128 GB nodes with 16 cores and a few TB of disk space are good enough for most purposes as long as you stay away from large organisms (bigger than a hundred Mbp or so) or metagenomes. The number of cores depends on the primary applications, as some are more parallel than others. For example, mapping scales near-perfectly (and thus benefits greatly from more cores), but assembly less so; Spades, for example, used an average of ~3 cores, I think, last time I ran it on a 32-core machine. Some other assemblers scale better though.

ADD REPLY • link 7.2 years ago by Brian Bushnell 20k

0

Entering edit mode

Hello Brian, I've tried to do trimming reads based on the BBduk by using the command above, during the processing it showed that "could not find the adapter reference". I actually updated BBmap package. Can you please let me know what's wrong with my command? or do I need to download the Ref. to my local computer. Thank you!!

ADD REPLY • link 2.2 years ago by Dũng • 0

0

Entering edit mode

The adapters.fa (with sequences of all adapters) file should be in the resources directory of bbmap software folder. You can substitute path to that file on your local computer in the command.