Questions about assembly of large metagenomics dataset
Entering edit mode
3.0 years ago
zorrilla • 0


I am attempting to assemble the dataset from ERP002469 using megahit. The dataset consists of ~140 paired end fastq files, between 2-10 GB in size each, about 1 TB in total.

Using k list: 27,37,47,57,67,77,87,97,107,117, I am currently running the assembly on a 512 GB RAM node using 20 cores. It has been running for around 30 hours, and the last log entry is: Assembling contigs from SdBG for k = 37 ---

My questions:

  • Do you have a rough idea of how long it will take for the entire assembly process to finish on a metagenomic dataset of such size?
  • Do you have any additional assembly tips for my particular dataset, besides the ones presented here?
  • Are there any pre-assembly steps that you would recommend? e.g. quality score filtering, will this result in a significant improvement in terms of computational time?

Thanks in advance!

assembly bioinformatics megahit metagenomics • 699 views
Entering edit mode

No idea about runtimes, but it seems slow. Try different kmer sizes, I would expect the larger kmers to be better, i.e. give longer contigs.

One thing first - you have trimmed the dataset first, right (essential!).


Login before adding your answer.

Traffic: 2811 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6