Question

Abyss Speed-Up Tricks

3

Entering edit mode

12.2 years ago

Darked89 4.6k

I am trying to assemble a 30-35Mbp diploid genome using Abyss from HighSeq Illumina runs. It takes a very long time (days) to compute assembly for just one k-mer using default settings on a 12 CPU/large RAM machine.

Hence my questions:

what is your experience with abyss-bwa and abyss-bowtie, both performance- and quality of assembly/scaffoldling-wise?
I use NFS-mounted partitions for both data and temp directories, which I guess slows down Abyss. How do I estimate how much of local disc space I will need for local temp directory?
Compression settings. I have found this Biostar post about pbzip2 and speed. Has anyone done comparisons with gzip/pigz?
scaffolding off. It seems that majority of my runs, Abyss spends on mapping reads to assembly/scaffolding. Since I am exploring k-mer space, I want to get contigs, check N50s, compare the assembly with related species genomes, then pick few good looking k-mers and rerun assembly with i.e. differently filtered/base-error-corrected data sets. Can I switch off the whole scaffolding part?
openmpi & Abyss: are there any i.e. minimum RAM requirements for cluster nodes to run Abyss without crashing?

Yes, I know there is a Abyss mailing list, but it takes a long time to get an answer from the overworked developer. Trawling through the archives did not gave me clear answers so far.

Thanks a lot for your help.

EDIT (partial answers)

ad 1: according to ABySS author, the default mapper/scaffolder performs better quality-wise than abyss-bwa and abyss-bowtie

ad 4: the answer is: "abyss-pe pe-contigs other_switches_go_here"

genome assembly • 3.8k views

ADD COMMENT • link updated 12.1 years ago by Lars Juhl Jensen 11k • written 12.2 years ago by Darked89 4.6k

2

Entering edit mode

I think you may figure it out by yourself. To speed things up, the key is to identify the bottleneck. Just run abyss normally and check "top" every half an hour to see which steps takes most of time. My guess is graph construction and simplification take most of time. As to scaffolding, if you assemble reads as single-end, I guess scaffolding will be skipped.

ADD REPLY • link 12.2 years ago by lh3 33k

1

Entering edit mode

I would also give SOAPdenovo and SGA a try.

ADD REPLY • link 12.2 years ago by lh3 33k

0

Entering edit mode

Days seems excessive - I'd expect hours on my server (24 CPU, 100 GB RAM). Assembly can often be "held up" by a very small number of "rogue reads" which mess up the graph, so you may want to look at some quality filtering to reduce the number of input reads.

ADD REPLY • link 12.2 years ago by Neilfws 49k