Question

Merging Dataset

2

Entering edit mode

12.0 years ago

khikho ▴ 100

To preparing larger Dataset for velvet, I merged two unmapped human reads files and now I have 3 questions ?!

1) Is this the right way to increase the quality of velvet output (I mean length of contig which I produced) ?!

2) After merging the files ,New file contains near 320,000,000 seq but when I run velvet on 72Gb memory it used all the memory and then run time increased (near 4 days to complete hashing part). So Do you know how much memory Should I allocate?!

3) Can I split this new file to 320 files and then run velvet on each of them parallel and in the end merge the velvet output to use as one assembly result?!?

p.s: I use colorbased version of velvet.

velvet next-gen sequencing assembly human • 2.4k views

ADD COMMENT • link updated 12.0 years ago by Nikolay Vyahhi ★ 1.3k • written 12.0 years ago by khikho ▴ 100

score 2 · Answer 1 · 2012-04-03

1) Yes, it is. More data you have - the better results are.

2) You can try smaller K to fit in memory. But assembling mammalian genomes with Velvet is somewhat nearly impossible. Assembling human genome in 72Gb RAM is also close to impossible even with any other de-novo assembler. Usually it takes 200+Gb of RAM depending on dataset.

Consider one of the following:

Assembly be reference (Bowtie / BWA), not de-novo.
Using ALLPATH / SOAPdenovo.

3) Definitely no.