Questions about assembly of large metagenomics dataset

0

Entering edit mode

5.4 years ago

zorrilla • 0

Hi,

I am attempting to assemble the dataset from ERP002469 using megahit. The dataset consists of ~140 paired end fastq files, between 2-10 GB in size each, about 1 TB in total.

Using k list: 27,37,47,57,67,77,87,97,107,117, I am currently running the assembly on a 512 GB RAM node using 20 cores. It has been running for around 30 hours, and the last log entry is: Assembling contigs from SdBG for k = 37 ---

My questions:

Do you have a rough idea of how long it will take for the entire assembly process to finish on a metagenomic dataset of such size?
Do you have any additional assembly tips for my particular dataset, besides the ones presented here?
Are there any pre-assembly steps that you would recommend? e.g. quality score filtering, will this result in a significant improvement in terms of computational time?

Thanks in advance!

assembly metagenomics megahit • 1.1k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 5.4 years ago by zorrilla • 0

0

Entering edit mode

No idea about runtimes, but it seems slow. Try different kmer sizes, I would expect the larger kmers to be better, i.e. give longer contigs.

One thing first - you have trimmed the dataset first, right (essential!).

ADD REPLY • link 5.4 years ago by colindaven 6.4k

Login before adding your answer.