Question: is 1.4 Gb too large a genome for SPAdes?
gravatar for AGE
17 months ago by
AGE20 wrote:

I want to assemble a reptile genome with the software spades since it has given me no problems installing, unlike many other programs (e.g. Masurca, velvet, SOAPdenovo, etc). I'm wondering if a diploid genome of size 1.4 Gb is too large for this program.

assembly genome • 1.2k views
ADD COMMENTlink modified 17 months ago by h.mon31k • written 17 months ago by AGE20

The program is unlikely to have any issues specifically. What hardware do you have available?

ADD REPLYlink written 17 months ago by Joe18k

I have access to a cluster. The program estimated that I need approx. 500G of ram last time it ran out of memory.

ADD REPLYlink written 17 months ago by AGE20

How many reads do you have/estimated coverage? You may be able to downsample your reads. Surpassing 500gb ram doesn’t sound right to me though. Something else may be going on.

ADD REPLYlink written 17 months ago by Joe18k
gravatar for Buffo
17 months ago by
Buffo1.8k wrote:

I have read papers that assembled genomes of almost 3 Gb with spades (if I remember the reference I will post it), so the length would no represent a problem. However, personally I had problems with complex genomes (with large, short and tandem repeats), it causes very fragmented and redundant assemblies. In addition, to assemble 50 Mb of diploid genome from 80 million of paired reads I needed about 40-50 gb of RAM using default parameters, increasing the kmer size it was impossible.

ADD COMMENTlink written 17 months ago by Buffo1.8k

Thanks for the info! Yes, it does use quite a bit of RAM. I tried running the program a few weeks ago and it estimated that I needed 500Gb of RAM.

ADD REPLYlink written 17 months ago by AGE20
gravatar for h.mon
17 months ago by
h.mon31k wrote:

From the SPAdes manual:

Note, that SPAdes was initially designed for small genomes. It was tested on bacterial (both single-cell MDA and standard isolates), fungal and other small genomes. SPAdes is not intended for larger genomes (e.g. mammalian size genomes). For such purposes you can use it at your own risk.

As Buffo noted, it is possible to use SPades with large genomes, and I have used it myself. But it was hit or miss, very often it would fail due to using to much memory or SPAdes would spit some error. Again as Buffo noted, complex genomes, or data with lower quality, can hugely increase memory usage, rendering SPAdes impractical.

Regarding installation problems, (mini)conda may be of great help.

ADD COMMENTlink written 17 months ago by h.mon31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1150 users visited in the last hour