I have the spades assembly of 109 samples of a plant pathogenic fungi. I have done BUSCO analysis for all the isolates. I want to compare the size of the assembly and contiguity with the size of the input data. How do I calculate and extract the assembly stats of each isolate in a tabular form? I also want to compare the size of the assemblies with the BUSCO stats (complete, partial and duplicate busco), so how do I extract the busco stats from the "short summary file" to a table for each isolate?
You could play around with bash scripting and BBTools/BBMap's (https://sourceforge.net/projects/bbmap/)
statswrapper.sh scripts for assembly statistics (note that these scripts flip N50 and L50 values from their definitions and likewise N90 and L90). In terms of the BUSCO stats, that is more of a text manipulation job using GNU core utilities or Perl, sed, awk, etc. If you post an example of the output and the desired result, perhaps someone can help you write a quick script to get the desired result.