I am doing genome assemblies with canu followed by two rounds of racon and two rounds of pilon.
The first time I performed an assembly on my dataset using this protocol, I ran BUSCO and returned a score of 97%. This was on a long read dataset of about 26 GB fro a dipteran genome.
I did another sequencing run and added ~10 GB of data to the assembly.
I followed the same protocol and ran BUSCO. The score decreased to 94%, due to fragmented BUSCO. How could this be possible? This isn't really a coding question, but I don't see how adding more data could create a worse assembly.