I am trying to assemble a 30-35Mbp diploid genome using Abyss from HighSeq Illumina runs. It takes a very long time (days) to compute assembly for just one k-mer using default settings on a 12 CPU/large RAM machine.
Hence my questions:
- what is your experience with abyss-bwa and abyss-bowtie, both performance- and quality of assembly/scaffoldling-wise?
- I use NFS-mounted partitions for both data and temp directories, which I guess slows down Abyss. How do I estimate how much of local disc space I will need for local temp directory?
- Compression settings. I have found this Biostar post about pbzip2 and speed. Has anyone done comparisons with gzip/pigz?
- scaffolding off. It seems that majority of my runs, Abyss spends on mapping reads to assembly/scaffolding. Since I am exploring k-mer space, I want to get contigs, check N50s, compare the assembly with related species genomes, then pick few good looking k-mers and rerun assembly with i.e. differently filtered/base-error-corrected data sets. Can I switch off the whole scaffolding part?
- openmpi & Abyss: are there any i.e. minimum RAM requirements for cluster nodes to run Abyss without crashing?
Yes, I know there is a Abyss mailing list, but it takes a long time to get an answer from the overworked developer. Trawling through the archives did not gave me clear answers so far.
Thanks a lot for your help.
EDIT (partial answers)
ad 1: according to ABySS author, the default mapper/scaffolder performs better quality-wise than abyss-bwa and abyss-bowtie
ad 4: the answer is: "abyss-pe pe-contigs other_switches_go_here"