Entering edit mode
7 weeks ago
m90
▴
30
Hello everyone,
I have isolated bacterial data, and now I want to perform assembly and annotation. For the first pipeline, I used MEGAHIT for assembly and QUAST for quality assessment. Then, I used BUSCO to obtain the single-copy genes. BUSCO provided me with a list of genes from my assembly along with their protein sequences, but it does not perform the annotation to identify what these genes are.
I have a few questions:
- Is there a tool that can take the list of genes from BUSCO to perform annotation, or do I need to do it manually?
- When using SPAdes with short reads, and if it indicates that the genome is not complete, how can I address this issue?
- Is there a reference-based assembly tool that I can use for assembly?
- I need to a pipeline for assembly, annotation, and gene prediction for the isolated bacteria.
Thank you!
I'd suggest looking at
shovillandprokkafrom Torsten Seeman (assuming you have illumina short reads).Some comments rather than an answer.
BUSCO only checks for universal single copy orthologs. Not full gene annotation. You'll need to find one of the many tools for genome annotation available. Check publications of recent bacterial references to find relevant ones.
Yes, there are a few, but they come with some pretty major caveats. For example, any mis-assembly in the reference and any major genomic rearrangements will be inherited from the reference. Usually, this is not a good idea but works in some use cases.
See comment above about finding tools in recent relevant publications.