QUAST is a convenient tool for assembly evaluation. QUAST paper has been accepted to Bioinformatics, and now it is available in Advanced Access.
QUAST computes a number of well-known metrics, including contig accuracy, number of genes discovered, N50, and others. It also introduces some new statistics, like NA50 (see paper). An analysis results in summary tables available in various formats, including human-readable plain text, easy for parsing tab-separated tables, and LaTeX. Additionally, the tool generates colorful plots. All tables and plots are summarized in an interactive HTML report.
QUAST provides an intuitive command-line interface, and a detailed manual that guides though all the metrics. The sourсe code is available on sourceforge. Furthermore, we started a beta version of QUAST web interface. We appreciate your feedback regarding the tool and the website, firstname.lastname@example.org.
I guess the suite does not easily work for evaluating human assembly because nucmer may require too much memory? NA50 has been used several times before. For example, I used something similar a year ago and my friends had used even earlier. That said, I am very impressed by IDBA-UD and SPAdes.
Yes, you are right that evaluating of human assembly is a challenge for QUAST because it uses Nucmer. However, it is feasible and QUAST tries to make it as fast as possible. Nucmer has a limit on a reference genome size equaled to 536 Mbp which is insufficient for human assembly (approx. 3 Gbp). To enable processing of such large references QUAST splits the reference into single chromosomes, which are usually less than 536 Mbp, and aligns an assembly to each reference chromosome separately. This alignment could be run in parallel that gives a significant speed-up. After that, QUAST carefully combines all alignments into a single file to produce the same results as if Nucmer was run on the whole reference genome. According to NA50 issue. Yes, there are some similar metrics in other tools and we wrote appropriate references in the paper. However, most of these metrics just break contigs into aligned blocks and don't take into account severity of errors caused the wrong connection of several blocks into single contigs. We distinguish two types of connection errors. Extensive misassemblies (or just misassemblies) and local ones. Example of a misassembly is a connection of two blocks from different chromosomes or different strands or distant fragments of one chromosome. Example of a local misassembly is a small (less than 1 kbp) gap or an overlap between two blocks in one contig. When computing NA50 we break contigs only at extensive misassemblies events.
On nucmer, have you considered other similar tools like Mauve or other whole-genome alignment tools that scale better to the human genome? On NA50, it depends on the alignment tools. If the aligner allows 1kbp gap, it will be able to align cross small gaps.
We tried Mauve in the beginning of QUAST project but in our experiments (we worked with bacterial genomes) it was 5-10 times slower than Nucmer. Maybe it would be better on human size genomes but I'm not sure about it. I think we will try it and other alignment tools and maybe will change our core aligner in the future but currently Nucmer is OK for us, especially in bacterial projects or in projects with 100-500 Mbp long genomes.
Does this essentially required the reference file?
Is there ayway to use quast with diploid assembly ? 4im using the Falcon assembler and it seem that Quast don't really know how to deal with it (I can join a picture if you want)
Thanks for your answer !