Soapdenovo Assembly Quality Assessment
5
8
Entering edit mode
11.6 years ago
toshnam ▴ 650

Hi all,

I've got assembly result (scaffolds) from SOAPdenovo and want to assess the quality.

How can I make statistics report containing N50, scaffold length distribution, contig count and so on.

Please let me know a related program.

hiseq assembly • 8.1k views
5
Entering edit mode
11.6 years ago
Ketil 4.1k

Hey, me too!

http://blog.malde.org/index.php/a50

Although the program reports N25, N50, and N75, the main selling point here is the graphics, I think the curves (similar to ROC curves) give a much better picture of the assembly than just N50. (N50 is of course the inclination of the curve at y=total_size/2).

Note that to make sensible comparisons between N50s, you need to set a fixed baseline size to compare against. Most assemblers can produce a huge slew of short contigs, and some minimum size limit is used for output. How this limit is set can alter N50 score substantially.

Also, N50 and other fragmentation issues doesn't tell nearly the whole story, and in at least one case, the assembler giving the best N50 scores, didn't perform nearly as accurately as the second best assembler.

Edit: just found this interesting note touching some of the issues: http://bioinformatics.oxfordjournals.org/content/21/24/4320.full

0
Entering edit mode

And just to update, we recently got linkage group information for one species. Turns out the most complete assembly (mapped most genes and reads, and also pretty good but not best N50) was severely misassembled. Sigh.

4
Entering edit mode
11.6 years ago

I wrote another one as a Biopiece

read_fasta -i scaffold.fasta | analyze_assembly -x

N50: 93282
MAX: 381782
MIN: 500
MEAN: 40780
TOTAL: 6891834
COUNT: 169
---


Martin

3
Entering edit mode
11.6 years ago
Benm ▴ 710

I wrote one before calengc.pl, you can find it from this page

Hope it would help.

2
Entering edit mode
10.0 years ago
Nikolay Vyahhi ★ 1.3k

QUAST (QUality ASsesment Tool for Genome Assembly) can be used to assess the quality of genome assemblies (both de novo reference based):

1
Entering edit mode
10.9 years ago

I suggest to test the Assemblathon metric perl script that produces many info and plots.