BUSCO is a successor to CEGMA and is often spoken about as being superior. However, I doubt that this is so. The thing is that CEGMA uses a set of ultra-conservative genes - the ones that are present in human, mouse, fruit fly, nematode, arabidopsis and yeasts. On the contrary, BUSCO uses genes that are single copy in at least 90% of species, thus the BUSCO criterion for inclusion of a gene in a reference set is less strict.
Thus, when I assemble a genome of some species and see that there are 95% of the CEGMA genes, I may be almost sure that approximately 95% of all genes of the species are assembled, since if a gene is present in human, mouse, fruit fly, nematode, arabidopsis and yeasts, it should be present in almost all eukaryotes, except some very exotic. On the other side, when I see that there are 95% of the BUSCO genes in my assembly, this doesn't really tell me how good my assembly is, since there is an ambiguity: the genome of my species may contain 95% of the BUSCO genes and thus the assembly is perfect, or, alternatively, the genome may contain 100% of the BUSCO genes and then the assembly is not perfect.
The question is: am I right that BUSCO is worse than CEGMA for estimation of assembly completeness?