Hello everyone,
I'm attempting to reconstruct a 100kb locus in 10 different genomes. My reference genome is closed, but the other genomes are in contigs and have been annotated using Bakta, resulting in .gbk files.
I've tried sorting the contigs of each genome against the reference and creating a pseudomolecule using Artemis, followed by a re-annotation from scratch. However, comparing these new annotations with the originals has proven to be a challenge.
Could anyone suggest a solution or provide any advice on how to approach this problem more efficiently? Are there specific tools or strategies that I can employ to facilitate the comparison and reconstruction of the locus in these genomes?
I appreciate any help you can offer. Thank you in advance!
Is the de novo annotation required for each contig? Reconstructing homologous regions in many genomes can be tricky regardless, but is especially difficult if the assemblies are not highly contiguous already. For example, if each pseudomolecule is made up of many contigs, you don't know if they even belong next to each other in each assembly.
If annotations are already available, anchored synteny tools could be useful. Otherwise, tools like SatsumaSynteny2 is an alignment based synteny tool, but doesn't overcome problems associated with fragmented assemblies.
Thanks for your reply, my idea is not to have to write it down. What I have is an alignment of the .gbk from the progressiveMauve. It would be ideal to have a tool that allows you to extract the regions of the locus from what one sees in the Mauve. Do you know any?
I don't understand your comment. What do you mean?
If you have an alignment already, why can't you use this to extract regions of interest?
I do not want to re-annotate the genome. I ask if there is a way to extract the region aligned with the mauve, in .gbk format.
Here is a forum post that may answer your question.