Question

Interpreting first round of maker

1

Entering edit mode

3.1 years ago

gilsorek12 ▴ 10

Hi, this is my first time using maker genome annotation pipeline.

I recently finished maker's first round and was surprised from the results I got (was expecting better results).

I used minimap2 to align a de novo transcriptome to the reference genome and let maker do the alignments of known Crustacean protein sequences and mRNA sequences of my specie from NCBI.

Prior to running maker I used BUSCO to evaluate my de novo transcriptome assembly and the genome (using metaeuk):

Transcriptome: C:99.6%[S:7.4%,D:92.2%],F:0.2%,M:0.2%,n:1013
Genome: C:88.5%[S:37.7%,D:50.8%],F:7.8%,M:3.7%,n:1013

I ran BUSCO on all the transcripts maker predicted to evaluate the results:

C:64.6%[S:34.5%,D:30.1%],F:19.2%,M:16.2%,n:1013

Although this is only the first round, what might cause ~160 BUSCOs missing from maker's predictions?

Can anyone please share from his experience, is it common?

Maybe I was over expecting and these are actually good first round results?

Regarding training ab initio annotation tools, would you use BUSCO as Augustus training? I have seen some tutorials which takes training sequences from mRNA annotations created in the first round (with 1000bp on each side), while others recommend filtering them (like in this: gene set filter/selection for training ab initio annotation tools ) and straight Augustus training

Thanks for consideration and help.

maker busco annotation • 1.3k views

ADD COMMENT • link updated 3.1 years ago by Dave Carlson ★ 1.7k • written 3.1 years ago by gilsorek12 ▴ 10

0

Entering edit mode

As far as I know, accuracy of Maker depends on how well the ab initio predictors (Augustus, SNAP, etc) are trained. In your case, looks like the predictors are doing a poor job. Using BUSCO to train Augustus is a good start if it is not trained for your species (I am assuming this is the case). By the way, BUSCO is a good estimate, but you should also pay attention to other metrics, like total number of predicted genes (does is make sense for your species?), average size of predicted proteins, introns, etc.

ADD REPLY • link 3.1 years ago by alex.zaccaron ▴ 410

score 1 · Answer 1 · 2021-03-13

What I do when running Maker (which more-or-less follows the procedure outlined here)

Run maker without any ab initio gene prediction using only the protein and transcript evidence
Filter the gene predictions (e.g., max AED of 0.25 and some minimum protein length)
Train SNAP on the filtered gene predictions from your first Maker run
At the same time, Train Augustus with BUSCO
Rerun Maker with the newly trained SNAP and Augustus models
Filter the Maker results and retrain SNAP and Augustus using the filtered Maker results
Repeat steps 5 & 6 one or two more times

This procedure seems to work reasonably well for the two plant genomes I've run it on. Obviously no protein coding gene annotation of a newly assembled genome is going to be perfect or complete. But I think this is a decent first step.