Parallelization Efficiency And Memory Usage Of Abyss, Allpaths-Lg
1
1
Entering edit mode
12.1 years ago
David M ▴ 580

Has anyone looked into the parallelization efficiency or the maximum memory usage for the Abyss or Allpaths-LG denovo assemblers? I'd like to access a computing resource and I need to gather this information for the proposal. Its difficult to estimate due to the fairly large memory requirements to assemble even a medium-sized dataset.

assembly memory • 3.1k views
ADD COMMENT
0
Entering edit mode

The resources required for de novo assembly is dependent upon the genome size and nature of repeats in the species you will be sequencing. It is difficult to provide a relevant answer without knowing something about the species' genomes. Is it a mammal, plant, bacteria, or some other group?

ADD REPLY
0
Entering edit mode

Large Eukaryote genome. Mostly I'm interested in how the efficiency of Abyss (for example) scales with multiple processors

ADD REPLY
1
Entering edit mode
12.1 years ago
Sujai Kumar ▴ 270

Hi David

One data point for your answer:

Genome size: ~100Mbp metazoan

Insert size: ~300 bp

Tech : 100bp Illumina paired end HiSeq2000 V3

Num Reads : ~140 M pairs`

ABySS 1.3.3 Took 6 hours on a 32 core machine. The read set was split into 20 interleaved files (if you split into interleaved fasta files, the reading-in-to-memory stage is sped up as each core can do that). The total memory consumption was about 2-3GB/core, i.e. 60-80GB.

Not an analytical/comprehensive answer, but am hoping other people will pitch in and provide more data points.

Cheers,

  • Sujai
ADD COMMENT

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6