Question

efficient algorithm for de novo assembly using de bruijn graph

0

Entering edit mode

5.8 years ago

StudentBio • 0

hello everebody please I am trying to assemble my reads using an algorithms for de novo assembly( De Bruijn Graph) and I don t know which algorith de novo more efficient in my case the optimal k using kmergenie is 17 Read Data : Ion Torrent, single end, percentage of GC 42, sequence length between 20 and 397

De Bruijn k-mers Assembly date palm • 1.5k views

ADD COMMENT • link updated 5.8 years ago by colindaven 6.4k • written 5.8 years ago by StudentBio • 0

1

Entering edit mode

What organism is it? How many reads do you have?

You can try SPAdes, which does it's own rounds of kmer optimisation.

ADD REPLY • link 5.8 years ago by Joe 21k

0

Entering edit mode

phoenix dactylifera genome length 22 953 390 bp ; het 4,25%

ADD REPLY • link 5.8 years ago by StudentBio • 0

1

Entering edit mode

Ususally the is no "better" assembler, and each time you have a new dataset, a different assembler may turn out to be the best. In addition, more information would be useful in getting good suggestions about good assemblers for your case, such as expected genome size, expected sequencing coverage, ploidy and heterozigosity of the organism, among others. For example, some assemblers are optimized for small- or medium-sized genomes (such as SPAdes and MIRA), others are good for large genomes.

Are you assembling several genomes? On a different thread ( choice of k value for mapping my reads again a reference genome ), you found 81 as best kmer acording to KmerGenie.

ADD REPLY • link 5.8 years ago by h.mon 35k

score 1 · Answer 1 · 2018-07-11

Thats a really poor dataset for de novo assembly as it contains only single end reads. Also, Ion torrent data contains lots of nasty indels which cause artificial frameshifts in predicted genes. Lastly, there is no long range information.

What coverage do you have ?

I would advise you to try soap2denovo and Spades. Potentially also Abyss.

By the way, shouldn't the genome be more like 600+ MBp ? https://www.nature.com/articles/ncomms3274