Question: is 1.4 Gb too large a genome for SPAdes?
0
gravatar for AGE
6 months ago by
AGE20
AGE20 wrote:

I want to assemble a reptile genome with the software spades since it has given me no problems installing, unlike many other programs (e.g. Masurca, velvet, SOAPdenovo, etc). I'm wondering if a diploid genome of size 1.4 Gb is too large for this program.

assembly genome • 713 views
ADD COMMENTlink modified 6 months ago by h.mon27k • written 6 months ago by AGE20

The program is unlikely to have any issues specifically. What hardware do you have available?

ADD REPLYlink written 6 months ago by Joe14k

I have access to a cluster. The program estimated that I need approx. 500G of ram last time it ran out of memory.

ADD REPLYlink written 6 months ago by AGE20

How many reads do you have/estimated coverage? You may be able to downsample your reads. Surpassing 500gb ram doesn’t sound right to me though. Something else may be going on.

ADD REPLYlink written 6 months ago by Joe14k
2
gravatar for Buffo
6 months ago by
Buffo1.7k
Buffo1.7k wrote:

I have read papers that assembled genomes of almost 3 Gb with spades (if I remember the reference I will post it), so the length would no represent a problem. However, personally I had problems with complex genomes (with large, short and tandem repeats), it causes very fragmented and redundant assemblies. In addition, to assemble 50 Mb of diploid genome from 80 million of paired reads I needed about 40-50 gb of RAM using default parameters, increasing the kmer size it was impossible.

ADD COMMENTlink written 6 months ago by Buffo1.7k

Thanks for the info! Yes, it does use quite a bit of RAM. I tried running the program a few weeks ago and it estimated that I needed 500Gb of RAM.

ADD REPLYlink written 6 months ago by AGE20
0
gravatar for h.mon
6 months ago by
h.mon27k
Brazil
h.mon27k wrote:

From the SPAdes manual:

Note, that SPAdes was initially designed for small genomes. It was tested on bacterial (both single-cell MDA and standard isolates), fungal and other small genomes. SPAdes is not intended for larger genomes (e.g. mammalian size genomes). For such purposes you can use it at your own risk.

As Buffo noted, it is possible to use SPades with large genomes, and I have used it myself. But it was hit or miss, very often it would fail due to using to much memory or SPAdes would spit some error. Again as Buffo noted, complex genomes, or data with lower quality, can hugely increase memory usage, rendering SPAdes impractical.

Regarding installation problems, (mini)conda may be of great help.

ADD COMMENTlink written 6 months ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2013 users visited in the last hour