Question: De Novo Genome Assembler Preferring Shorter Error-Free Contigs
1
gravatar for misaghb
5.2 years ago by
misaghb20
United States
misaghb20 wrote:

Hi folks. I need to run a de novo short-read genome assembler (on a paired-end/mate-pair library) that prefers outputting shorter but error-free contigs rather than longer contigs/scaffolds which may be mis-assembled. What assembler or what specific setting in an assembler of choice do you recommend to yield such contigs (as error-free as possible and no contig overlappings)?

genome contigs denovo • 1.4k views
ADD COMMENTlink modified 4.4 years ago • written 5.2 years ago by misaghb20
4
gravatar for Rohit
5.2 years ago by
Rohit1.3k
California
Rohit1.3k wrote:

I think error free contigs depends on the quality of your data too and the contamination if any. It also depends on the repetitiveness of genome, level of polymorphism (inorder to know the correctness of contigs) and heterozygosity of the individual. SOAP contigs are short as they start from K+1 of your kmer. By increasing the min_abundance parameter in denovo assemblers, you can get more accurate contigs. Minia is definitely one of the ones to try out.

If you have lesser number of error-free reads, go for overlap assembler such as CAP3. This wouldn't work for a large number of reads due to memory constraints.

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Rohit1.3k
2
gravatar for lexnederbragt
5.2 years ago by
lexnederbragt1.2k
Oslo, Norway
lexnederbragt1.2k wrote:

According to the first GAGE paper, SGA makes shorter, but very much correct contigs. See http://genome.cshlp.org/content/early/2012/01/12/gr.131383.111.full.pdf

ADD COMMENTlink written 5.2 years ago by lexnederbragt1.2k
1
gravatar for misaghb
4.4 years ago by
misaghb20
United States
misaghb20 wrote:

According to this paper in BMC Bioinformatics journal:

  • For short read libraries (e.g. Illumina MiSeq): CLC bio assembler (CLC Assembly Cell) (commerical, free 2-week trial)
  • For Roche 454 read libraries: Newbler (Roche)

These assemblers tend to break reads and contigs at repeat boundaries and place repeated elements into separate contigs. Hence we might have more conservative and better quality (less likely to be mis-assembled) contigs.

ADD COMMENTlink written 4.4 years ago by misaghb20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 697 users visited in the last hour