I am trying to assemble a plant genome using AWS resources using velvet. Plant genome is huge (> 10 times human genome) and coverage is around 30 x. We are planning for de novo assembly with Velvet (threads enabled). I would like to know if there is a calculator that can provide approximate time taken for assembly, given the necessary details. For eg. if I furnish RAM, CPU (type of instance), genome size, approximate coverage, type of sequencing (PE or SE) and number of client nodes, it should give me the number of hours or days that would take for assembly (reference and/or de novo).
There is no simple answer to your question. There are other factors that influence time and memory use, like ploidy of the genome, heterozygosity, repetitive elements content, quality of the reads, among others. I will illustrate with two examples:
in one case, I assembled four bacterial genomes with SPAdes, all different strains from the same genus, similar sequencing coverage (100x) for all. Three of them finished in 2-4 of hours, the last one took more than a day. The culprit was library insert size and sequencing quality, which were worst than the other three.
a second case of two insect genomes, sister species, similar genome sizes and coverage (20x). I assembled both with SGA, one took 3-4 days, the other took one month to complete. In this case, although the genomes were similar in size, one had more repeats than the other, and apparently this threw SGA off its tracks.
P.S.: Velvet is a good assembler and, at its time, it was among the best assemblers available. However, its development stopped and it has been surpassed by others, specially in terms of time and memory use.