Quast contig values not matching
1
0
Entering edit mode
10 months ago
jamie.pike ▴ 80

Hello, I have recently generated a de novo fungal genome assembly using SPAdes. To gather some of the assembly statistics, I used QUAST Version: 5.0.2 with the following command:

quast.py --fungus -o QuastResult ../SPAdesAssembly_UK0001.fasta

I have checked the output from QUAST in the report.txt file, and I cannot work out why the contig number is smaller than the contigs >= 0 bp value.


Assembly                    SPAdesAssembly_UK0001
# contigs (>= 0 bp)         7261  # WHY DO THESE VALUES NOT MATCH?                    
# contigs (>= 1000 bp)      4364                      
# contigs (>= 5000 bp)      1537                      
# contigs (>= 10000 bp)     831                       
# contigs (>= 25000 bp)     306                       
# contigs (>= 50000 bp)     117                       
Total length (>= 0 bp)      36362106                  
Total length (>= 1000 bp)   34953349                  
Total length (>= 5000 bp)   28238965                  
Total length (>= 10000 bp)  23295690                  
Total length (>= 25000 bp)  15154642                  
Total length (>= 50000 bp)  8596018                   
# contigs                   5580  # WHY DO THESE VALUES NOT MATCH?                          
Largest contig              152401                    
Total length                35833794                  
GC (%)                      48.88                     
N50                         18695                     
N90                         2367                      
auN                         31462.1                   
L50                         435                       
L90                         2695                      
# N's per 100 kbp           0.00   

I have checked the QUAST manual and it explains that the "contigs >= x is the total number of contigs of length >= x. This metric doesn't depend on --min-contig command line parameter" and that the "number of contigs is the total number of contigs in the assembly", so surely this value should be the same for my output when comparing the total number of contigs to the total number of contigs with length >0.

When I use grep to count the contigs it returns the same value as is reported for the number of contigs with length >0.

grep -c ">" SPAdesAssembly_UK0001.fasta
SPAdesAssembly_UK0001.fasta:7261

I can't find much more detail on the difference between these two results, and why the total number of contigs it reports is less than the number of contigs with length >0. Can anyone explain why the numbers QUAST reports for me are different?

Thank you

quast • 708 views
ADD COMMENT
1
Entering edit mode
8 months ago

The report.txt output file you mention should start with "All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs)." right at the top of it, before the listed results. This would explain why # contigs and the total length are lower than # contigs (>= 0 bp)" and "Total length (>= 0 bp)", respectively.

ADD COMMENT

Login before adding your answer.

Traffic: 2635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6