De Novo Genome Assembler Preferring Shorter Error-Free Contigs
3
1
Entering edit mode
7.8 years ago
misaghb ▴ 20

Hi folks. I need to run a de novo short-read genome assembler (on a paired-end/mate-pair library) that prefers outputting shorter but error-free contigs rather than longer contigs/scaffolds which may be mis-assembled. What assembler or what specific setting in an assembler of choice do you recommend to yield such contigs (as error-free as possible and no contig overlappings)?

denovo genome contigs • 2.0k views
ADD COMMENT
4
Entering edit mode
7.8 years ago
Rohit ★ 1.5k

I think error free contigs depends on the quality of your data too and the contamination if any. It also depends on the repetitiveness of genome, level of polymorphism (inorder to know the correctness of contigs) and heterozygosity of the individual. SOAP contigs are short as they start from K+1 of your kmer. By increasing the min_abundance parameter in denovo assemblers, you can get more accurate contigs. Minia is definitely one of the ones to try out.

If you have lesser number of error-free reads, go for overlap assembler such as CAP3. This wouldn't work for a large number of reads due to memory constraints.

ADD COMMENT
2
Entering edit mode
7.8 years ago
lexnederbragt ★ 1.3k

According to the first GAGE paper, SGA makes shorter, but very much correct contigs. See http://genome.cshlp.org/content/early/2012/01/12/gr.131383.111.full.pdf

ADD COMMENT
1
Entering edit mode
7.0 years ago
misaghb ▴ 20

According to this paper in BMC Bioinformatics journal:

  • For short read libraries (e.g. Illumina MiSeq): CLC bio assembler (CLC Assembly Cell) (commerical, free 2-week trial)
  • For Roche 454 read libraries: Newbler (Roche)

These assemblers tend to break reads and contigs at repeat boundaries and place repeated elements into separate contigs. Hence we might have more conservative and better quality (less likely to be mis-assembled) contigs.

ADD COMMENT

Login before adding your answer.

Traffic: 1463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6