Question

Low busco scores after annotating genome using same model as another species in genus

0

Entering edit mode

13 hours ago

Wilber0x ▴ 60

I am annotating the genome of a non model species of monocot. I have successfully annotated the sister species using BRAKER3 with a lot of RNA seq data as input. The busco scores for the annotation of the sister species is as follows

C:96.3%[S:92.9%,D:3.3%],F:1.0%,M:2.7%,n:1614

I have assembled the genome with hifiasm and evaluated it with quast. I have a very complete genome.

C:98.8%[S:94.7%,D:4.1%],F:0.5%,M:0.7%,n:1614

I softmasked my genome with EDTA and then tried to annotate my genome with BRAKER3, except I have no RNA data. I am using the protein sequence fasta file, and the species model from the BRAKER3 outputs of the sister species. I am also using two protein sequence files that I have made from the busco database and monocot sequences on phytozome. Here is my braker command:

 apptainer exec -B ${PWD}:${PWD} ${BRAKER_SIF} /opt/BRAKER/scripts/braker.pl \
 --genome=/home/genomeAssembly.fa \
 --GENEMARK_PATH=${ETP}/gmes \
--gff3 \
--species = sisterSpeciesModel --useexisting \
--prot_seq=/home/phytozomePro.fa, /home/proteins_odb10_plants.fa, /home/sisterSpeciesPro.fasta \
--softmasking \
--AUGUSTUS_CONFIG_PATH=/home/augustus_config/config

Unfortunately the busco scores for the annotation are low

C:40.5%[S:34.9%,D:5.6%],F:27.9%,M:31.7%,n:1614

How can I increase the busco scores without generating my own RNA seq data?

braker genome annotation braker3 • 69 views

ADD COMMENT • link updated 12 minutes ago by Panos ★ 1.9k • written 13 hours ago by Wilber0x ▴ 60

0

Entering edit mode

First off, without RNAseq you'll never get to the level of your sister species.

That said, 40% seems way too low to me; I would expect >70-80%. To improve your BUSCO scores you should try different kinds of protein databases as evidence. The ones you used make perfect sense, but also try not using all of them (just one at a time). Also, try using a very inclusive protein data set such as SwissProt. Or the relevant slice of Uniref50 (i.e. only the monocot Uniref50 clusters). Also, try using translated peptides from the transcriptome assembly of your sister species. Also try using the sister's RNAseq although last time I tried this I didn't get any improvement (however, I wasn't at ~40% BUSCO scores). Trying different combinations of the above will hopefully get you a better BUSCO score...

Lastly, back when I last ran Braker, I had to include the --ep_mode parameter when I only had protein evidence...

ADD REPLY • link 12 minutes ago by Panos ★ 1.9k