Low busco scores after annotating genome using same model as another species in genus
0
0
Entering edit mode
13 hours ago
Wilber0x ▴ 60

I am annotating the genome of a non model species of monocot. I have successfully annotated the sister species using BRAKER3 with a lot of RNA seq data as input. The busco scores for the annotation of the sister species is as follows

C:96.3%[S:92.9%,D:3.3%],F:1.0%,M:2.7%,n:1614 

I have assembled the genome with hifiasm and evaluated it with quast. I have a very complete genome.

C:98.8%[S:94.7%,D:4.1%],F:0.5%,M:0.7%,n:1614

I softmasked my genome with EDTA and then tried to annotate my genome with BRAKER3, except I have no RNA data. I am using the protein sequence fasta file, and the species model from the BRAKER3 outputs of the sister species. I am also using two protein sequence files that I have made from the busco database and monocot sequences on phytozome. Here is my braker command:

 apptainer exec -B ${PWD}:${PWD} ${BRAKER_SIF} /opt/BRAKER/scripts/braker.pl \
 --genome=/home/genomeAssembly.fa \
 --GENEMARK_PATH=${ETP}/gmes \
--gff3 \
--species = sisterSpeciesModel --useexisting \
--prot_seq=/home/phytozomePro.fa, /home/proteins_odb10_plants.fa, /home/sisterSpeciesPro.fasta \
--softmasking \
--AUGUSTUS_CONFIG_PATH=/home/augustus_config/config 

Unfortunately the busco scores for the annotation are low

C:40.5%[S:34.9%,D:5.6%],F:27.9%,M:31.7%,n:1614 

How can I increase the busco scores without generating my own RNA seq data?

braker genome annotation braker3 • 69 views
ADD COMMENT
0
Entering edit mode

First off, without RNAseq you'll never get to the level of your sister species.

That said, 40% seems way too low to me; I would expect >70-80%. To improve your BUSCO scores you should try different kinds of protein databases as evidence. The ones you used make perfect sense, but also try not using all of them (just one at a time). Also, try using a very inclusive protein data set such as SwissProt. Or the relevant slice of Uniref50 (i.e. only the monocot Uniref50 clusters). Also, try using translated peptides from the transcriptome assembly of your sister species. Also try using the sister's RNAseq although last time I tried this I didn't get any improvement (however, I wasn't at ~40% BUSCO scores). Trying different combinations of the above will hopefully get you a better BUSCO score...

Lastly, back when I last ran Braker, I had to include the --ep_mode parameter when I only had protein evidence...

ADD REPLY

Login before adding your answer.

Traffic: 3482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6