Hello,
I am running canu 2.1.1 to assemble pacbio sequel II reads for an insect with an estimated genome size of 1.9 Gbps. I am running all of this over SLURM. I initially ran the default parameters of canu but ran out of storage space (20T), so I looked at their FAQ page and found options that were recommended to add under "My assembly is running out of space, is too slow?".
namely the options:
corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 512 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975
The canu command I ran was:
canu -p veletis -d output corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 512 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975 gridOptions="--time=15:00:00" genomeSize=1.96g -pacbio ~/projects/def-bsincla7/arteen/veletis/raw_reads/pacbio/fastq/longread_c1.fastq ~/projects/def-bsincla7/arteen/veletis/raw_reads/pacbio/fastq/longread_c2.fastq
I let this run but it is about halfway through the jobs for correction/1-overlapper and is at 15T out of 20T that I have available. Most of the size is in correction/1-overlapper/results in the *.ovb files.
Are there any other options/tweaks that I can potentially use to reduce the storage size that anyone knows of? I hope that made sense, I can clarify anything if necessary.
Thanks for the help!