For your first question regarding annotation of the genes identified by BUSCO: BUSCO assesses genome completeness using single-copy orthologs but does not provide functional annotation. There is no dedicated tool that directly uses BUSCO output for annotation. Instead, annotate the entire assembly with a tool such as Prokka, which predicts genes and assigns functions via database searches. This will cover the BUSCO-identified genes as part of the process. Alternatively, for specific protein sequences from BUSCO, perform individual annotations using BLAST against UniProt or InterProScan.
prokka --outdir my_annotation --prefix my_genome --kingdom Bacteria assembly.fasta
For your second question on addressing an incomplete genome from SPAdes with short reads: Incomplete assemblies often result from low coverage, poor read quality, or repetitive regions. Increase sequencing depth to at least 50x if possible. Trim and filter reads using Trimmomatic to remove adapters and low-quality bases. If long reads are available, switch to a hybrid assembler like Unicycler, which combines short and long reads to resolve gaps.
unicycler -1 short_R1.fastq -2 short_R2.fastq -l long_reads.fastq -o output_dir
For your third question on reference-based assembly tools: Yes, tools exist for reference-guided assembly of bacterial genomes. Ragout is suitable for scaffolding contigs using one or more reference genomes. Rebaler works well with long reads aligned to a reference via minimap2, followed by consensus polishing.
For your fourth question on a pipeline for assembly, annotation, and gene prediction: Use this sequence for isolated bacterial data with short reads. First, quality control with FastQC and Trimmomatic. Assemble with SPAdes or MEGAHIT. Assess quality with QUAST and BUSCO. Annotate with Prokka, which includes gene prediction via Prodigal. For more comprehensive annotation, submit to NCBI's PGAP. The nf-core/bacass pipeline automates much of this process if you have Nextflow installed.
Kevin
I'd suggest looking at
shovillandprokkafrom Torsten Seeman (assuming you have illumina short reads).Some comments rather than an answer.
BUSCO only checks for universal single copy orthologs. Not full gene annotation. You'll need to find one of the many tools for genome annotation available. Check publications of recent bacterial references to find relevant ones.
Yes, there are a few, but they come with some pretty major caveats. For example, any mis-assembly in the reference and any major genomic rearrangements will be inherited from the reference. Usually, this is not a good idea but works in some use cases.
See comment above about finding tools in recent relevant publications.