Dear all, I have assembled a algal genome n predicted ~9k genes using Augustus. Genes were evaluated using Busco. Only 132 genes were put under different category of Busco (C,S,D,F and M). When I checked for C reinhardtii, around 300 genes were put under those categories. But in article they are telling ~80% (do not recall exact number) of d genes are complete.
My question is how to interpret Busco output??
somewhat related issue :
A: Is BUSCO really better than CEGMA for genome assembly quality evaluation?
I recently was in a similar situation and in the end figured out that BUSCO is not really suited (adequate) for estimating completeness in algal genomes, mainly due to biases in their core gene set.
Any other approach you opted to address the concern??
Yes, the approach mentioned in that veeckman et al paper. Not sure if that is a public dataset though , they are colleagues from me here in the lab so I could easily get hold of it :)
Combined with more classical approaches such as fraction fo RNAseq mapped (and genes covered by etc) stats