Correlation between gene clusters and Complete BUSCOs
I am a beginner to the concepts of biosynthetic gene clusters (BGCs) and BUSCOs. I ran antiSMASH and BUSCO analysis for my data. As a primary results, I tested for correlation between the No. of BGCs and complete BUSCOs. I observed a negative correlation between the same.

I would like to know whether there is actually any correlation between the both and if yes, how do I test my data accordingly.

It would be great if anyone could help me out with this.

Thank you..

BUSCO aims at evaluating the quality of an assembly - an ideal assembly would find 100% of the complete BUSCOs, and they would be all single copy. Being an assembly evaluation tool, one would expect BUSCO results to correlate positively with other analyses which depend on genome quality, such as annotation of BGCs.

However, BGC clusters may correlate negatively with assembly quality, either because BGCs are more abundant in taxons with more complex genomes, thus more difficult to assemble; or because BGCs themselves may pose difficulties to the assembly process.

How many genomes are you analysing? What is their taxonomic distribution? Were all samples sequenced at similar depths and assembled with the same pipeline?


