Question: Denovo assembly SPAdes ERROR K-mer Counting: too many k-mers to fit into available memory limit
1
gravatar for ambi1999
2.5 years ago by
ambi199930
United Kingdom/Cardiff/Cardiff Metropolitan University
ambi199930 wrote:

Hi,

I am getting following error while doing denovo assembly using SPAdes on a linux with 15 GB RAM and more than 50GB space left in hard disk. The size of two fastq files being used as input is about 2.4 GB and 2 GB.

ERROR K-mer Counting The reads contain too many k-mers to fit into available memory limit. Increase memory limit and restart

Spades.log:

Command line: spades.py --careful -o WT_ -1 firstfile.fq -2 secondfile.fq -m 10

System information:
  SPAdes version: 3.5.0
  Python version: 2.7.12
  OS: Linux-4.4.0-59-generic-x86_64-with-Ubuntu-16.04-xenial

Output dir: SPAdes-3.5.0-Linux/bin/WT_
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['firstfile.fq']
      right reads: ['secondfile.fq']
      interlaced reads: not specified
      single reads: not specified
Read error correction parameters:
  Iterations: 1
  PHRED offset will be auto-detected
  Corrected reads will be compressed (with gzip)
Assembly parameters:
  k: automatic selection based on read length
  Mismatch careful mode is turned ON
  Repeat resolution is enabled
  MismatchCorrector will be used
  Coverage cutoff is turned OFF
Other parameters:
  Dir for temp files: tmp
  Threads: 16
  Memory limit (in Gb): 10


======= SPAdes pipeline started. Log can be found here: SPAdes-3.5.0-Linux/bin/WT_/spades.log


===== Read error correction started. 


== Running read error correction tool: SPAdes-3.5.0-Linux/bin/hammer SPAdes-3.5.0-Linux/bin/WT_/corrected/configs/config.info

   0:00:00.000    4M /    4M   INFO  General                 (main.cpp                  :  82)   Loading config from SPAdes-3.5.0-Linux/bin/WT_/corrected/configs/config.info
   0:00:00.000    4M /    4M   INFO  General                 (memory_limit.hpp          :  42)   Memory limit set to 10 Gb
   0:00:00.001    4M /    4M   INFO  General                 (main.cpp                  :  91)   Trying to determine PHRED offset
   0:00:00.001    4M /    4M   INFO  General                 (main.cpp                  :  97)   Determined value is 33
   0:00:00.002    4M /    4M   INFO  General                 (hammer_tools.cpp          :  36)   Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ]
     === ITERATION 0 begins ===
   0:00:00.002    4M /    4M   INFO K-mer Index Building     (kmer_index.hpp            : 467)   Building kmer index
   0:00:00.002    4M /    4M   INFO K-mer Splitting          (kmer_data.cpp             : 127)   Splitting kmer instances into 128 buckets. This might take a while.
   0:00:00.002    4M /    4M   INFO  General                 (file_limit.hpp            :  29)   Open file limit set to 1024
   0:00:00.002    4M /    4M   INFO K-mer Splitting          (kmer_data.cpp             : 145)   Memory available for splitting buffers: 0.416504 Gb
   0:00:00.002    4M /    4M   INFO K-mer Splitting          (kmer_data.cpp             : 153)   Using cell size of 436736
   0:00:00.857    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 167)   Processing firstfile.fq
   0:00:18.381    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 813597 reads
   0:00:38.048    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 1673452 reads
   0:00:57.634    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 2519299 reads
   0:01:16.964    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 3305418 reads
   0:01:37.462    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 4168421 reads
   0:01:50.922    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 4493764 reads
   0:01:50.922    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 167)   Processing secondfile.fq
   0:02:08.666    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 5591263 reads
   0:02:35.935    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 6636651 reads
   0:03:20.730    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 176)   Processed 8752362 reads
   0:03:25.466    3G /    3G   INFO K-mer Splitting          (kmer_data.cpp             : 181)   Processed 8987528 reads
   0:03:25.620   32M /    3G   INFO  General                 (kmer_index.hpp            : 345)   Starting k-mer counting.
   0:03:53.603   32M /    3G   INFO  General                 (kmer_index.hpp            : 351)   K-mer counting done. There are 418968448 kmers in total.
   0:03:53.603   32M /    3G   INFO  General                 (kmer_index.hpp            : 353)   Merging temporary buckets.
   0:04:11.857   32M /    3G   INFO K-mer Index Building     (kmer_index.hpp            : 476)   Building perfect hash indices
   0:06:34.813  160M /    7G   INFO  General                 (kmer_index.hpp            : 371)   Merging final buckets.
   0:06:50.161  160M /    7G   INFO K-mer Index Building     (kmer_index.hpp            : 515)   Index built. Total 144936940 bytes occupied (2.7675 bits per kmer).
   0:06:50.161  160M /    7G  ERROR K-mer Counting           (kmer_data.cpp             : 261)   The reads contain too many k-mers to fit into available memory limit. Increase memory limit and restart


== Error ==  system call for: "['SPAdes-3.5.0-Linux/bin/hammer', 'SPAdes-3.5.0-Linux/bin/WT_/corrected/configs/config.info']" finished abnormally, err code: 255

In case you have troubles running SPAdes, you can write to spades.support@bioinf.spbau.ru
Please provide us with params.txt and spades.log files from the output directory.

params.txt

Command line: spades.py --careful -o WT_ -1 firstfile.fq -2 secondfile.fq -m 10

System information:
  SPAdes version: 3.5.0
  Python version: 2.7.12
  OS: Linux-4.4.0-59-generic-x86_64-with-Ubuntu-16.04-xenial

Output dir: SPAdes-3.5.0-Linux/bin/
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['firstfile.fq']
      right reads: ['secondfile.fq']
      interlaced reads: not specified
      single reads: not specified
Read error correction parameters:
  Iterations: 1
  PHRED offset will be auto-detected
  Corrected reads will be compressed (with gzip)
Assembly parameters:
  k: automatic selection based on read length
  Mismatch careful mode is turned ON
  Repeat resolution is enabled
  MismatchCorrector will be used
  Coverage cutoff is turned OFF
Other parameters:
  Dir for temp files: SPAdes-3.5.0-Linux/bin/WT_/tmp
  Threads: 16
  Memory limit (in Gb): 10

Thx for the help.

Cheers, Ambi.

denovo assembly spades • 4.0k views
ADD COMMENTlink modified 2.5 years ago by genomax70k • written 2.5 years ago by ambi199930
5
gravatar for Brian Bushnell
2.5 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

15GB RAM is tiny for most purposes of genome assembly. Also, you're using an old version of Spades; newer versions are more memory-efficient, so you should upgrade.

If for some reason you only have 15GB RAM available, you can greatly reduce the amount of time and memory Spades will need using the BBMap package to discard low-abundance kmers. Generally, I'd recommend a pipeline something like this (specifics would require much more information about your data):

#Trim adapters
bbduk.sh in=r1.fq in2=r2.fq out=trimmed.fq ktrim=r k=23 mink=11 hdist=1 ref=/bbmap/resources/adapters.fa tbo tpe

#Normalize
bbnorm.sh in=trimmed.fq out=normalized.fq target=100 min=5

#Assemble
spades.py -k 21,41,71,101,127 -o out -12 normalized.fq

As shenwei indicated, you can also reduce memory consumption by using shorter kmers. But if you are actually interested in getting the best possible assembly, you shouldn't do that; rather, you should find a computer with more memory.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Brian Bushnell16k

Hi,

Thx Brian and shenwei. I have now upgraded to spade 3.10 version but getting the same memory problem. I will try to use BBMap later today. Do you know of any free servers (such as CLIMB for Microbial genomics http://www.climb.ac.uk/) available for University research projects to analyse ngs data?

I fully understand that I need to move to a higher spec machine but just to understand the memory issue I had a look at the system monitor (screen shot attached) while running SPAdes and it only using about 50% of available RAM max. Also the 16 GB of swap space that I have got is almost unused. Does it mean that SPAdes pre calculates the amount of RAM memory required and throws error if enough memory is not available without having a go at running with whatever RAM is available and using swap space?

Thx.

Cheers, Ambi.

system monitor image

ADD REPLYlink written 2.5 years ago by ambi199930

You used the flag "-m 10" which tells Spades to crash if it exceeds 10GB. Try running without that flag. Also, swap is not going to be very useful for genome assembly; it's just too slow.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Brian Bushnell16k

Thx Brian. Tried running withouth -m flag and with -m 15 but it still crashes because of low memory. I will try bbmap soon.

Meanwhile what sort of configuration you would recommend for genome assembly incase I can manage some funding for a new local machine? I know higher spec will be better but is there some spec which is recommended?

Thx again, Ambi.

ADD REPLYlink written 2.5 years ago by ambi199930

It's also possible to do this stuff in the cloud, of course; that way if you run out of memory, you can just rent a bigger computer. For reference, our smallest nodes for assembly are 128 GB. Bacteria are tiny so they normally don't need that much - 32GB is normally adequate - though a fungus might. But contamination can massively increase the amount of memory needed.

If you are planning to buy a computer dedicated to bioinformatics - assembly, mapping, etc - I can hardly recommend getting one smaller than 128 GB. Also, more cores is better, but memory is more important than cores. 128 GB nodes with 16 cores and a few TB of disk space are good enough for most purposes as long as you stay away from large organisms (bigger than a hundred Mbp or so) or metagenomes. The number of cores depends on the primary applications, as some are more parallel than others. For example, mapping scales near-perfectly (and thus benefits greatly from more cores), but assembly less so; Spades, for example, used an average of ~3 cores, I think, last time I ran it on a 32-core machine. Some other assemblers scale better though.

ADD REPLYlink written 2.5 years ago by Brian Bushnell16k
2
gravatar for ambi1999
2.5 years ago by
ambi199930
United Kingdom/Cardiff/Cardiff Metropolitan University
ambi199930 wrote:

Hi Brian,

I have now managed to do denovo on my local machine by first normalising using your bbnorm package. It required far less memory (around 6 GB as compared to 15 GB without normalization).

Many thanks for recommending the bbnorm. I just noted that you are developer of this package as well which is great a contribution to the community. Very well done and thanks once again.

ps: got another question regarding this project but will post it as separate question as it is not related to denovo.

Cheers, Ambi.

ADD COMMENTlink written 2.5 years ago by ambi199930

Hi Ambi,

I'm happy BBNorm was helpful in this case! To close this thread as resolved, if you feel that is has in fact been resolved, please accept the answer.

Thanks, Brian

ADD REPLYlink written 2.5 years ago by Brian Bushnell16k

just accepted the answer. Thx again for your help.

Ambi.

ADD REPLYlink written 2.5 years ago by ambi199930
1
gravatar for shenwei356
2.5 years ago by
shenwei3564.8k
China
shenwei3564.8k wrote:

ERROR K-mer Counting The reads contain too many k-mers to fit into available memory limit. Increase memory limit and restart

Command line: spades.py --careful -o WT_ -1 firstfile.fq -2 secondfile.fq -m 10

Choose smaller max-kmer-size (-k) or switch to a server with more RAM and disk space.

Here's mine for PE150 reads of bacteria sequencing

spades.py -k 21,33,55,77 -t 12 -m 50 --careful -o {}/spades -1 {}/{%}_1.fq.gz -2 {}/{%}_2.fq.gz

ADD COMMENTlink written 2.5 years ago by shenwei3564.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 831 users visited in the last hour