Question

Assessing Quality And Accuracy Of De Novo Genome Assembly

5

Entering edit mode

11.7 years ago

a81526a ▴ 60

All, I am curious whether anyone out there has a method for assessing the quality and accuracy of de novo genome assemblies? I am currently doing in silico simulations of de novo genome assembly from a previously sequenced genome to determine the best assembly parameters (K-mer size, coverage cutoff etc) and optimal dataset (mate pair library size, coverage etc). The ultimate goal will be to use these parameters to assemble the genome of a related species, de novo.

However, the difficulty is that after simulating the data and making a de novo assembly I don't know of any statistics or methods to compare the assembled contigs back to original sequence that they were simulated from. This requires two steps (1) align assembled contigs to reference genome (2) assess the fit

People often optimize N50, assembly size, contig number and other length-based measurements - but this only makes for bigger and bigger contigs and there is little information about whether these contigs are accurate. I have been using BLAST to compare the contigs to the reference and asking how well they fit, how long the alignments are and how many mis-assembled contigs there are. If anyone has ideas or methods for assessing the accuracy ( or overall similarity of an assembly and a genome) I would be grateful to hear about it.

assembly blast contigs quality similarity genome • 8.3k views

ADD COMMENT • link updated 10.2 years ago by h.mon 35k • written 11.7 years ago by a81526a ▴ 60

0

Entering edit mode

Duplicate of: How to assess the quality of an assembly? (Is there no magic formula?)

ADD REPLY • link 11.7 years ago by SES 8.6k

score 1 · Answer 1 · 2013-10-30

1

Entering edit mode

11.7 years ago

Rohit ★ 1.5k

There is a tool named QUAST for assessment of genome assemblies http://bioinf.spbau.ru/QUAST

Have never used it though.

ADD COMMENT • link 11.7 years ago by Rohit ★ 1.5k

0

Entering edit mode

If there is a similar reference to compare against then this is very good at giving a "real" N50 value.

ADD REPLY • link 11.7 years ago by rob234king ▴ 610

score 0 · Answer 2 · 2015-05-08

The software Rohit mentioned on his answer, QUAST, accepts a reference genome and provide an analysis of the assemble against it, including %overlap, %missing and missassemblies. I believe QUAST uses nucmer from the MUMmer package to perform this analysis.

BLAST Ring Image Generator may also be helpful for your purpose.