Question: Calculating alignment/mapping time
0
gravatar for kumars.sv
10 months ago by
kumars.sv0
kumars.sv0 wrote:

I am trying to assemble a plant genome using AWS resources using velvet. Plant genome is huge (> 10 times human genome) and coverage is around 30 x. We are planning for de novo assembly with Velvet (threads enabled). I would like to know if there is a calculator that can provide approximate time taken for assembly, given the necessary details. For eg. if I furnish RAM, CPU (type of instance), genome size, approximate coverage, type of sequencing (PE or SE) and number of client nodes, it should give me the number of hours or days that would take for assembly (reference and/or de novo).

ADD COMMENTlink modified 10 months ago by h.mon27k • written 10 months ago by kumars.sv0

Since there was no reply, I have reposted this on SO here: https://stackoverflow.com/questions/53314743/calculating-alignment-mapping-time

ADD REPLYlink written 10 months ago by kumars.sv0

Well, that question got downvoted and hmon suggested to move the query to bioinformatics.stackexchange. Moved it bioinformatics stack exchange. ty hmon

ADD REPLYlink written 10 months ago by kumars.sv0
1
gravatar for h.mon
10 months ago by
h.mon27k
Brazil
h.mon27k wrote:

There is no simple answer to your question. There are other factors that influence time and memory use, like ploidy of the genome, heterozygosity, repetitive elements content, quality of the reads, among others. I will illustrate with two examples:

  • in one case, I assembled four bacterial genomes with SPAdes, all different strains from the same genus, similar sequencing coverage (100x) for all. Three of them finished in 2-4 of hours, the last one took more than a day. The culprit was library insert size and sequencing quality, which were worst than the other three.

  • a second case of two insect genomes, sister species, similar genome sizes and coverage (20x). I assembled both with SGA, one took 3-4 days, the other took one month to complete. In this case, although the genomes were similar in size, one had more repeats than the other, and apparently this threw SGA off its tracks.

P.S.: Velvet is a good assembler and, at its time, it was among the best assemblers available. However, its development stopped and it has been surpassed by others, specially in terms of time and memory use.

ADD COMMENTlink modified 10 months ago • written 10 months ago by h.mon27k

I understand that it is tricky to calculate the estimated time for assembly and there are several factors that influence the assembly. It seems there is no such tool. For your PS point, what would be the suggested assembler, in terms of memory management and resources for polyploid genomes? Btw, thanks for your time. From recent paper (2018) on assemblers show velvet and abyss are better assemblers for eukaryotic genomes (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5826002/ - A Comprehensive Study of De Novo Genome Assemblers: Current Challenges and Future Prospective).

ADD REPLYlink modified 10 months ago • written 10 months ago by kumars.sv0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1159 users visited in the last hour