Question

De Novo Genome Assembler Preferring Shorter Error-Free Contigs

1

Entering edit mode

10.1 years ago

misaghb ▴ 20

Hi folks. I need to run a de novo short-read genome assembler (on a paired-end/mate-pair library) that prefers outputting shorter but error-free contigs rather than longer contigs/scaffolds which may be mis-assembled. What assembler or what specific setting in an assembler of choice do you recommend to yield such contigs (as error-free as possible and no contig overlappings)?

contigs denovo genome • 2.5k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 10.1 years ago by misaghb ▴ 20

Ram · Accepted Answer · 2014-03-27

I think error free contigs depends on the quality of your data too and the contamination if any. It also depends on the repetitiveness of genome, level of polymorphism (in order to know the correctness of contigs) and heterozygosity of the individual. SOAP contigs are short as they start from K+1 of your kmer. By increasing the min_abundance parameter in denovo assemblers, you can get more accurate contigs. Minia is definitely one of the ones to try out.

If you have lesser number of error-free reads, go for overlap assembler such as CAP3. This wouldn't work for a large number of reads due to memory constraints.

Ram · Accepted Answer · 2014-03-28

2

Entering edit mode

10.1 years ago

lexnederbragt ★ 1.3k

According to the first GAGE paper, SGA makes shorter, but very much correct contigs. See http://genome.cshlp.org/content/early/2012/01/12/gr.131383.111.full.pdf

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 10.1 years ago by lexnederbragt ★ 1.3k

Ram · Accepted Answer · 2015-01-06

According to this paper in BMC Bioinformatics journal:

For short read libraries (e.g. Illumina MiSeq): CLC bio assembler (CLC Assembly Cell) (commerical, free 2-week trial)
For Roche 454 read libraries: Newbler (Roche)

These assemblers tend to break reads and contigs at repeat boundaries and place repeated elements into separate contigs. Hence we might have more conservative and better quality (less likely to be mis-assembled) contigs.