I am attempting to assemble the dataset from ERP002469 using megahit. The dataset consists of ~140 paired end fastq files, between 2-10 GB in size each, about 1 TB in total.
Using k list: 27,37,47,57,67,77,87,97,107,117, I am currently running the assembly on a 512 GB RAM node using 20 cores. It has been running for around 30 hours, and the last log entry is: Assembling contigs from SdBG for k = 37 ---
- Do you have a rough idea of how long it will take for the entire assembly process to finish on a metagenomic dataset of such size?
- Do you have any additional assembly tips for my particular dataset, besides the ones presented here?
- Are there any pre-assembly steps that you would recommend? e.g. quality score filtering, will this result in a significant improvement in terms of computational time?
Thanks in advance!