I have 27 soil WGS metagenome datasets and I am trying to assemble them into contigs that are at least 1000-2000 kb long. Each dataset on its own is 20-30 Gigabytes of paired-end read fastq files.
I first tried the Ray Meta assembler because it's supposed to run well in parallel. I was able to do that for most datasets but have gotten very short contigs (most are <500 kb). Then I found this paper that suggests it does better for low-complexity datasets.
I also took a look at Concoct and I think the strategy sounds like it makes sense, but the code on their github pages is woefully outdated and I'm not sure how much of it is still maintained. Also, it suggests combining all datasets into one and then trying to assemble it (a "coassembly") and using that for the downstream analysis, but that approach will be computationally challenging since my datasets are so large.
If anyone has any experience with assembling complex soil datasets, I'd love to do some brain storming, so please reach out!