Forum: Computer resources in assembly softwares
1
gravatar for joprietoe
3 months ago by
joprietoe10
joprietoe10 wrote:

Many papers claim that it will be needed new approaches to allow genome sequence assemblers to scale with the amount of data generated. But, will it be necessary distributed assembly like Lazer, Spaler, Ray, and SWAP? Actually, a server with moderate resources (500 GB RAM) can deal with very large datasets. For example, the memory required to encode an Iowa prairie soil metagenomics dataset takes 345 GB memory for MEGAHIT and only 29 GB for Minia [MEGAHIT:https://academic.oup.com/bioinformatics/article/31/10/1674/177884].

ADD COMMENTlink modified 3 months ago by h.mon15k • written 3 months ago by joprietoe10
1
gravatar for h.mon
3 months ago by
h.mon15k
Brazil
h.mon15k wrote:

If I were discussing this over a beer, maybe I could entertain a discussion for hours and hours. As BioStars is better suited for focused questions and answers, I will make just a couple of remarks:

First and foremost, your post ignores the fact that the MEGAHIT assembly is almost one order of magnitude better than the Minia assembly at several metrics.

Many papers claim what they claim so they get published. It may even be justified, but technology and science moves fast, and some 3-year old claims justified at that time may be obsolete today.

As sequencers move to longer and better (less errors) reads, assembly will become an easier problem (some people even think it will be trivial in the near future), and there won't be much need for special hardware.

ADD COMMENTlink written 3 months ago by h.mon15k

AFAIK Illumina does not have a known product on the horizon that is longer than 300 bp reads. While the read lengths are getting longer with PacBio/Nanopore the problem of relatively high error rates will likely be around for some time to come.

10x recently did a webinar in which they claimed to have got diploid assemblies (not fully resolved) for many genomes (including human) with their Supernova assembler. Their published hardware requirements for human sized genomes for supernova are relatively (16+ cores, 256GB) modest.

Pure sequencing may not solve the assembly problem completely but things like Bionanogenomics mapping technology are poised to provide a major assist.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour