Question: de bruijn graph for paired end reads
3.1 years ago
Star70 wrote:

I have a set of paired end reads from an unknown species. I want to assemble them to find the main genome. I have a program based on the basic de bruijn graph algorithm which is for single reads. Are there any other classic algorithm which works better in the case of paired end reads.? (Or any other version of de bruijn graph algorithm) Also I have 3 versions of these reads from 3 sample of the same species. Could them help me to improve my assembler?

written 3.1 years ago by Star700
3.1 years ago
Philipp Bayer6.7k
Philipp Bayer wrote:

There are heaps of De Bruijin graph implementations for paired end reads:




DISCOVAR (needs 250 bp paired reads, PCR-free)


ALLPATHS-LG (needs matepaired data)

and many many many more

As for your three versions from three samples, do you expect the samples to be different? If they're replicates (identical) then you can treat them as three libraries in the assembler and assemble it all together.

written 3.1 years ago by Philipp Bayer

I'd like to also mention SPAdes, which is the only De Bruijin graph assembler I'm aware of that actually makes the graph using paired k-mers (k-bimers) from paired reads.

written 3.1 years ago by Brian Bushnell

Thanks, I missed that! SPAdes has given me really good (best?) results with smaller (nonplant) genomes

written 3.1 years ago by Philipp Bayer

It consistently gives us the best results with bacteria/archaea. We also routinely use it for metagenomes, but the resource requirements are so much higher than Megahit that it is only usable on low-complexity metagenomes, or metagenomes that have been highly processed (normalized, error-corrected, and low-depth reads removed). I've never tried it on a eukaryote, but it would not surprise me if it did a good job. Particularly on haploid euks like some fungi we assemble.

written 3.1 years ago by Brian Bushnell
