How can you measure the completeness of an annotation process?
1
0
Entering edit mode
13 months ago

I am annotating a plant genome using Maker-P. I used EST and transcriptome data. I reduced the redunancy in the EST using cdhit. After three rounds of Maker( EST2genome and protein2genome followed by training SNAP twice and training Augustus twice) I now have a total set of genes. I am expecting more genes than I now have, although this is a novel genome with no reference.

How can I tell if my annotation is complete?

Thanks

Assembly • 919 views
0
Entering edit mode

What is your expectation based on? You could compare with related species.

0
Entering edit mode

Closely related species have gene counts of about 26,857, 23,197 , 22,427 but the paper that reported this had a Complete (%) to CEGs by CEGMA pipeline 86.29

0
Entering edit mode

And how many do you have?

0
Entering edit mode

I have 17973 with a BUSCO of C:68.4%[S:64.5%,D:3.9%],F:6.0%,M:25.6%,n:1440

The BUSCO score for the genome assembly is 93.7%

0
Entering edit mode

I ran BUSCO with this commanline

python /mnt/bin/busco/scripts/run_BUSCO.py -i  ~.maker.transcripts.fasta -o output -l \${LINEAGE} -m transcriptome -c 15  -sp my_species  -z --augustus_parameters='--progress=true'


C:68.4%[S:64.5%,D:3.9%],F:6.0%,M:25.6%,n:1440

The BUSCO score for the genome assembly is 93.7%

0
Entering edit mode

You lost 25% of the Busco genes during the annotation process. This is not good

0
Entering edit mode

I am trying to use Braker for re-annotation and to evaluate. But BRAKER has been very difficult to use. It keeps dying without any error.

Do you have any suggestion on how to recover the lost 25% BUSCO?

0
Entering edit mode

Did you activate the keep_pred parameter?

0
Entering edit mode

No I did not activate the keep_pred. When I set keep_pred=1 it gives proteins with AED of 1 see example:
mRNA-1 protein AED:1.00 eAED:1.00 QI:0|0|0|0|1|1|6|0|661

0
Entering edit mode

Normal it adds prediction that do not have any support from the evidence (est or protein)

0
Entering edit mode

Can one proceed with these unsupported predictions?

0
Entering edit mode

So run with keep_preds. If you have between 25000 and 30000 genes is fine, your busco will be much better. Then yiu can also give a try without snap and check the busco. Deactivating can give better results

0
Entering edit mode
13 months ago
Juke34 ★ 6.4k

Run BUSCO on you assembly. Then get the protein you have predicted (all of them with isoforms) and run BUSCO in protein mode. Compare the global result (do not care about duplicated ones) You should have something pretty close. If your Busco on proteins is way below you have a problem in the annotations steps.