I am trying to assemble a plant genome using AWS resources using velvet. Plant genome is huge (> 10 times human genome) and coverage is around 30 x. We are planning for de novo assembly with Velvet (threads enabled). I would like to know if there is a calculator that can provide approximate time taken for assembly, given the necessary details. For eg. if I furnish RAM, CPU (type of instance), genome size, approximate coverage, type of sequencing (PE or SE) and number of client nodes, it should give me the number of hours or days that would take for assembly (reference and/or de novo).
Well, that question got downvoted and hmon suggested to move the query to bioinformatics.stackexchange. Moved it bioinformatics stack exchange. ty hmon
There is no simple answer to your question. There are other factors that influence time and memory use, like ploidy of the genome, heterozygosity, repetitive elements content, quality of the reads, among others. I will illustrate with two examples:
in one case, I assembled four bacterial genomes with SPAdes, all different strains from the same genus, similar sequencing coverage (100x) for all. Three of them finished in 2-4 of hours, the last one took more than a day. The culprit was library insert size and sequencing quality, which were worst than the other three.
a second case of two insect genomes, sister species, similar genome sizes and coverage (20x). I assembled both with SGA, one took 3-4 days, the other took one month to complete. In this case, although the genomes were similar in size, one had more repeats than the other, and apparently this threw SGA off its tracks.
P.S.: Velvet is a good assembler and, at its time, it was among the best assemblers available. However, its development stopped and it has been surpassed by others, specially in terms of time and memory use.
I understand that it is tricky to calculate the estimated time for assembly and there are several factors that influence the assembly. It seems there is no such tool. For your PS point, what would be the suggested assembler, in terms of memory management and resources for polyploid genomes? Btw, thanks for your time. From recent paper (2018) on assemblers show velvet and abyss are better assemblers for eukaryotic genomes (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5826002/ - A Comprehensive Study of De Novo Genome Assemblers: Current Challenges and Future Prospective).
Since there was no reply, I have reposted this on SO here: https://stackoverflow.com/questions/53314743/calculating-alignment-mapping-time
Well, that question got downvoted and hmon suggested to move the query to bioinformatics.stackexchange. Moved it bioinformatics stack exchange. ty hmon