Why we need a distributed assembler based on MPI/Mapreduce when we can store Debruijn graph in a compact way in a single node using FM-index, Bloom filter and succinct graph structure techniques?

0

Entering edit mode

7.2 years ago

saranpons3 ▴ 70

Hello All, I would like to know that Why we need a distributed assembler based on MPI/Mapreduce when we can store Debruijn graph in a compact way using FM-index, Bloom filter and succinct graph structure techniques and assemble human genomes in a single node computer with less RAM (For example, Minia assembler(http://minia.genouest.org/) used only 5.7 Gb RAM for assembling human genome) ? Also, I would like to know that the assembly running time of distributed assemblers which are based on MPI/Mapreduce is better than FM-index, Bloom filter and succinct graph structure based assemblers?

Assembly denovo debruijn • 1.5k views

ADD COMMENT • link 7.2 years ago by saranpons3 ▴ 70

1

Entering edit mode

Who told you that you need a distributed assembler? As far as I know, there are very few tools based on Map Reduce available at the moment.

ADD REPLY • link 7.2 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Some of the earlier assemblers didn't yet use de Bruijn graphs the more memory-efficient methods. Since memory on cluster nodes is often not that high, people tended to use MPI so they would have access to more memory. I don't know of any that used map-reduce though. Anyway, with the advent of minia and such this isn't really needed any more.

ADD REPLY • link 7.2 years ago by Devon Ryan 104k

Login before adding your answer.