Assessing Quality And Accuracy Of De Novo Genome Assembly
2
5
Entering edit mode
10.5 years ago
a81526a ▴ 60

All, I am curious whether anyone out there has a method for assessing the quality and accuracy of de novo genome assemblies? I am currently doing in silico simulations of de novo genome assembly from a previously sequenced genome to determine the best assembly parameters (K-mer size, coverage cutoff etc) and optimal dataset (mate pair library size, coverage etc). The ultimate goal will be to use these parameters to assemble the genome of a related species, de novo.

However, the difficulty is that after simulating the data and making a de novo assembly I don't know of any statistics or methods to compare the assembled contigs back to original sequence that they were simulated from. This requires two steps (1) align assembled contigs to reference genome (2) assess the fit

People often optimize N50, assembly size, contig number and other length-based measurements - but this only makes for bigger and bigger contigs and there is little information about whether these contigs are accurate. I have been using BLAST to compare the contigs to the reference and asking how well they fit, how long the alignments are and how many mis-assembled contigs there are. If anyone has ideas or methods for assessing the accuracy ( or overall similarity of an assembly and a genome) I would be grateful to hear about it.

assembly blast contigs quality similarity genome • 7.8k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
10.5 years ago
Rohit ★ 1.5k

There is a tool named QUAST for assessment of genome assemblies http://bioinf.spbau.ru/QUAST

Have never used it though.

ADD COMMENT
0
Entering edit mode

If there is a similar reference to compare against then this is very good at giving a "real" N50 value.

ADD REPLY
0
Entering edit mode
9.0 years ago
h.mon 35k

The software Rohit mentioned on his answer, QUAST, accepts a reference genome and provide an analysis of the assemble against it, including %overlap, %missing and missassemblies. I believe QUAST uses nucmer from the MUMmer package to perform this analysis.

BLAST Ring Image Generator may also be helpful for your purpose.

ADD COMMENT

Login before adding your answer.

Traffic: 1518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6